Stable species boundaries despite ten million years of hybridization in tropical eels

Genomic evidence is increasingly underpinning that hybridization between taxa is commonplace, challenging our views on the mechanisms that maintain their boundaries. Here, we focus on seven catadromous eel species (genus Anguilla) and use genome-wide sequence data from more than 450 individuals sampled across the tropical Indo-Pacific, morphological information, and three newly assembled draft genomes to compare contemporary patterns of hybridization with signatures of past introgression across a time-calibrated phylogeny. We show that the seven species have remained distinct for up to 10 million years and find that the current frequencies of hybridization across species pairs contrast with genomic signatures of past introgression. Based on near-complete asymmetry in the directionality of hybridization and decreasing frequencies of later-generation hybrids, we suggest cytonuclear incompatibilities, hybrid breakdown, and purifying selection as mechanisms that can support species cohesion even when hybridization has been pervasive throughout the evolutionary history of clades.


Supplementary Notes
Supplementary Note 1: Contrasting patterns of within-species genomic variation. Our extensive sampling scheme of tropical eels permitted detailed analyses of genomic variation within A. marmorata, A. megastoma and A. obscura, as we sampled each of the three species at multiple sites throughout their geographic distribution ( Fig. 1a; Supplementary Table 1). These analyses were based on a dataset of 155,896 RAD-sequencing derived single-nucleotide polymorphisms (SNPs), partitioned according to species and subsequently filtered to exclude invariant sites (minor allele count > 2) and missing data (> 20%; Supplementary Figure 1). Using these partitioned datasets, principal-component analysis (PCA) of genomic variation was performed with smartpca in EIGENSOFT v.6.0.1 [1], including the function "lsqproject" to account for missing data. For A. marmorata, PCA separated four populations present in the western Indian Ocean (South Africa, Reunion, Mayotte), in Indonesia (Java), the South China Sea (Philippines and Taiwan), and the western South Pacific (Bougainville Island, Solomon Islands, Vanuatu, New Caledonia, Samoa, and American Samoa). The latter three, however, were only discernible on the second principal-component axis (explaining 2.2% of genetic variation), along which the individuals from Java appeared intermediate between those from the western Indian Ocean and the western South Pacific (Supplementary Figure 4a). Our results are thus consistent with divergence among Indian and Pacific ocean populations [2][3][4][5] and with the region of Java representing a contact zone between those populations [5]. Conversely, no population structure was detected in either A. megastoma or A. obscura ( Supplementary Figures 4c-f), supporting the hypothesized single spawning area for the two species in the western South Pacific [6,7].
Supplementary Note 2: Double-digest restriction-site associated DNA (ddRAD) sequencing. Following Peterson et al. [8], 20 units of EcoRI-HF (New England Biolabs) and 20 units of MspI (New England Biolabs) were used to digest 400 ng of genomic DNA per sample in a 37 • C incubation for 8 hours. Digests were purified with homemade paramagnetic carboxyl-modified beads (Sera-Mag, Fisher Scientific) [9]. Moreover, samples were randomly placed in PCR plates, ensuring a wide coverage of geographic sampling locations per plate during library preparation. T4 DNA ligase (New England Biolabs) was applied to ligate 100 ng of each digested DNA fragment to a EcorRI-specific P1 adapter that contained a 5-bp barcode and the MspI-specific P2 adapter in room temperature, followed by an enzyme heat-kill at 65 • C for 10 min. Twenty-four unique barcodes were used so that the ligated DNA fragments from 24 individuals could be pooled (according to one index) to form a single ddRAD-seq library. Ligations were cleaned with homemade paramagnetic carboxylmodified beads. Fragments in the range of 300-400 bp were selected using AMPure XP beads (Agilent Technologies) and were subsequently amplified by 12 rounds of PCR with the following conditions: 98 • C for 60 s; 12 cycles of 98 • C for 10 s, 60 • C for 30 s, 72 • C for 30 s; 72 • C for 10 min using Q5 ® High Fidelity polymerase (New England Biolabs), and the Illumina sequencing primers (PCR Primer 1 and Index added PCR Primer 2; four unique 6-bp multiplexing indices were used). In total, six separate PCR reactions of the same index were amplified, pooled and cleaned with homemade paramagnetic carboxyl-modified beads. Library quality was assessed on a TapeStation 2200 (Agilent Technologies) to confirm fragment recovery on the selected range and was quantified using a Qubit Fluorometer 2.0. To avoid index-hopping [10,11], none of the samples shared both the P1 barcode and the multiplexing index in a given sequencing run. In total, 20 libraries, each containing 24 barcodes, were sent to Macrogen (Korea) for 100 bp paired-end Illumina HiSeq 4000 sequencing.
Supplementary Note 5: Identification of genomic regions with potential structural rearrangements. To identify structural rearrangements among eel genomes that could potentially be linked to cytonuclear incompatibilities, we performed whole-genome alignment with the program LASTZ v.1.04 [27]. This was done separately for the assemblies of A. japonica [21], A. marmorata, A. megastoma, and A. obscura that were all aligned to the A. anguilla reference genome assembly [28], after excluding scaffolds shorter than 100,000 bp from this reference assembly. To prepare all genome assemblies for whole-genome alignment, we first identified and soft-masked repetitive regions with both Tandem Repeats Finder (TRF) v.4.07b [29] and RepeatMasker v.1.0.8 (http://www.repeatmasker.org). For TRF, we applied a matching weight of 2, a mismatching penalty of 7, an indel penalty of 7, a match probability of 80, and indel probability of 10, and limited the report to alignments with a minimum score of 50 and a maximum period size of 2,000. RepeatMasker was run with default settings, using a repeat library generated ab initio with the associated tool RepeatModeler. Various Genome Browser and BLAT utilities (http://hgdownload.soe.ucsc.edu/admin/exe) [30] were used for format conversions. Subsequent to alignment with LASTZ, the tool single cov2 from the MULTIZ-TBA program package v.012109 [31] was applied to remove overlapping regions from aligned blocks in the whole-genome alignments.
We then investigated each of the four pairwise whole-genome alignments for signals of structural genomic rearrangements between A. anguilla and the other eel species, focusing on alignment blocks with a minimum LASTZ score of 50,000 and a minimum length of 1,000 bp. Specifically, when the same two scaffolds appeared in multiple alignment blocks, we recorded a potential inversion if some of these alignment blocks differed in their orientation and a potential transposition if the aligned fragments from the two scaffolds did not appear in the same order on those scaffolds. For each potential rearrangment, the start and end positions of adjacent alignment blocks were used to localize the rearrangement within a region of the A. anguilla reference assembly. Out of 378 potential rearrangements, we excluded 110 that could not be localized more precisely than within 10 kb from further analyses. We unified the set of potential rearrangement regions across the four pairwise whole-genome alignments by merging overlapping regions on the A. anguilla reference assembly. For example, if a potential rearrangement between A. marmorata and A. anguilla was found between positions 1 kb and 3 kb on a given A. anguilla scaffold and a potential rearrangement between A. megastoma and A. anguilla was detected between positions 2 kb and 4 kb on the same A. anguilla scaffold, we assumed that a rearrangements may have occured at the same position in both pairs, between 2 kb and 3 kb of the scaffold. This produced a set of 255 regions for which one or more of the four whole-genome alignments supported the presence of a rearrangement.
Supplementary Note 6: Verification of the presence or absence of rearrangements among eel genomes. For each of the 255 regions with potential rearrangements (see Supplementary Note 5), we inspected once again each of the original four pairwise whole-genome alignments with more permissive filtering thresholds as before to determine the presence or absence of the rearrangement between the species pair. Specifically, we first determined, for each region and each pairwise whole-genome alignment, the scaffolds in the alignment blocks closest to the start and the end of the region, and, if any, the scaffolds in alignment blocks completely within the region. If a single alignment block or multiple closely spaced (< 500 bp) alignment blocks (with sequences in identical orientation and order) spanned the entire region, the rearrangement was recorded to be absent between the species pair. If multiple more distantly spaced (≥ 500 bp) alignment blocks were localized within the region and included the same non-reference scaffold, an inversion was recorded if the alignment blocks differed in their orientation, and a transposition was recorded if the order of aligned fragments differed between the reference and the non-reference scaffold. When different non-reference scaffolds were included in alignment blocks covering the region, with LASTZ alignment scores greater than 50,000 and alignment lengths greater than 500 bp, we recorded that the presence of the rearrangement was unknown in the given species pair.
For each region, we then determined whether it was within or close to a coding sequence on the A. anguilla reference genome assembly according to gene prediction with AUGUSTUS v.3.3.3 [32] (see Supplementary Note 7). If a coding sequence overlapped with the region or was located within 2,000 bp upstream or downstream of it, we used BLASTP [25] searches to compare the translated coding sequences to the zebrafish (Danio rerio) proteome (assembly version GRCz11; NCBI accession GCA 000002035.4 [33]) to identify homologous proteins. We further generated dot plots for each pair of scaffolds with a putative rearrangement, and visually assessed whether or not the resulting 874 dot plots confirmed the presence of rearrangements. A list of all detected potential rearrangements, with or without visual confirmation, is provided in Supplementary Table 11. Nine regions on eight different A. anguilla scaffolds were visually confirmed to contain rearrangements that were species-specific to either A. marmorata, A. megastoma, or A. obscura; dot plots for these regions are shown in Supplementary Figure 21.
As errors in the assemblies of A. marmorata, A. megastoma, and A. obscura could potentially generate the same patterns as true rearrangements in dot plots, we further investigated the WGS reads of these species mapped to either the A. anguilla reference genome assembly and the speciesspecific assembly. For true rearrangements, we expected a rather homogenous depth of reads mapped to the species-specific assembly but short regions with a depth approaching zero in the reads mapped to the A. anguilla assembly. We also expected reads without proper pairing at the boundaries of true inversions when reads are mapped to the A. anguilla assembly, but not when these are mapped to the species-specific assembly. In contrast, we expected putative errors in the species-specific assemblies to produce regions with a read depth close to zero as well as large proportions of reads without proper pairing when reads are mapped to species-specific assemblies but not when these are mapped to the A. anguilla assembly. We mapped reads to the A. anguilla and species-specific assemblies with BWA MEM v.0.7.17 [34] and subsequently used SAMtools v.1.3 to remove all unmapped reads, reads with a mapping quality below 40, reads without mapped mates, duplicate reads, and reads that were placed in supplementary alignments (SAM flag 3596). From the remaining set of high-quality reads, we identified reads without proper pairing as those that had a mate on the same strand or within unexpected distance (SAM flags 2 and 48).
From the resulting distributions of reads in putative recombination regions (also shown in Supplementary Figure 21), we conclude that species-specific inversions are present in A. obscura on scaffold scf7180011634263 (Supplementary Figure 21a) and in A. megastoma on scaffold scf7180010922797 (Supplementary Figure 21d). In addition, we find evidence for the heterozygous presence of inversions in A. obscura on scaffold scf7180011662322 (Supplementary Figure 21g Figure 22b). The latter of these occurs in a region that is homologous to exon 35 of the myhc4 gene in zebrafish (Danio rerio) (Supplementary Figure 22b). The inversion may thus affect the protein encoded by this gene, myosin heavy chain, in A. megastoma.

Supplementary Note 7:
De novo gene prediction for the A. anguilla reference genome assembly. We performed de novo gene prediction for the A. anguilla reference genome [28] using the AUGUS-TUS v.3.3.3 web server (http://bioinf.uni-greifswald.de/webaugustus/prediction) [32]. We applied the zebrafish (Danio rerio) training set, excluded the prediction of untranslated regions and alternative transcripts, and set AUGUSTUS to report genes on both strands while considering potential conflict between the two strands.

Supplementary Figures
Supplementary Figure 1: Molecular datasets used in this study.

Genetic distance
Supp. Table 5 IQ-TREE Supp. Fig. 17a Genalogy interrog.   a-b) Comparison of first and second (a), and third and fourth (b), principal components of genomic variation in A. marmorata. Fill and stroke colors indicates geographic origin. The individual VAG12033 is characterized by a large proportion of missing data (see Supplementary Table 1), which might explain its outlier positions. c-d) Comparison of first and second (c), and third and fourth (d), principal components of genomic variation in A. megastoma. e-f ) Comparison of first and second (e), and third and fourth (f), principal components of genomic variation in A. obscura. Even though A. obscura individuals appear to form three clusters along PC1 (e), these are not correlated to sampling site, sampling year, morphology, or proportion of missing data, and are therefore not further discussed. Putative between-species hybrids (see Supplementary Table 7) were exluded from this analysis.
Supplementary Figure 5: Genomic variation among tropical eel species. a) First and second principal components of genomic variation among the seven species A. marmorata (cyan), A. luzonensis (brown), A. megastoma (purple), A. obscura (red), A. bicolor (blue), A. interioris (magenta), and A. mossambica (green). b) Third and fourth principal components of genomic variation among the seven species. c) First and second principal components of genomic variation, focusing on the four species marmorata (cyan), A. megastoma (purple), A. obscura (red), and A. interioris (magenta). d) Third and fourth principal components of genomic variation, for the same four species as in c). The two individuals VAG12033 and PHP14P16 are characterized by a large proportion of missing data (see Supplementary Table 1  a-b) Maximum-clade-credibility (MCC) summary trees of SNAPP analyses with 5,000 transition or transversion sites, respectively, without topology constraints and a single age constraint on the root divergence according to [12] (J2014). Node bars indicate 95% highest posterior density intervals intervals. The gray area marks the group of species for which we find evidence of past and ongoing hybridization, and the dotted line indicates the crown age of this group. c-d) As a-b), but with a topology constraint on the position of A. interioris as the sister species to a clade combining A. bicolor and A. obscura; this position is supported by maximum-likelihood inference with IQ-TREE (Supplementary Figure 19). e-f ) As a-b), but with a topology constraint on the position of A. interioris as the outgroup to a formed by A. bicolor, A. obscura, A. marmorata, and A. luzonensis; this position is supported by genealogy interrogation (Fig. 3a). g-h) As a-b), but excluding the the two species A. luzonensis and A. interioris due to their strong signals of ancient hybridization. i) MCC tree of a BEAST analysis with 103 nuclear genes. Sequences of these genes from A. anguilla and A. japonica were included in [18] (M2019) and were here complemented with orthologs extracted from the new genome assemblies of A. marmorata, A. megastoma, and A. obscura. A single age constraint on the root was used for calibration according to [18]. j) MCC tree of a BEAST analysis with four mitochondrial genes that were used in [19] (R2018). The dataset of [19] was here complemented with orthologs from the three new genome assemblies (A. marmorata, A. megastoma, and A. obscura), and a single age constraint on the root was used according to [19]. Note that the timescale in j) differs from a-i). Unless specified, all nodes received full Bayesian support.    TAI15005  PHC08P22  TAI15006  PHP14P13  PHP14P12  PHP14P19  PHP14P04  TAI15007  TAI15021  TAI15025  TAI15009  TAI15008  TAI15022  TAI15010  PHP14P26  PHP14P25  PHP14P27  TAI15026  PHP14P28  PHP14P24  PHP14P30  PHP14P29  TAI15024  TAI15029  TAI15030 TAI15028  PHP14P03  PHP14P18  TAI15016  TAI15023  PHP14P14  PHP14P09  PHP14P05  PHP14P10  PHP14P02  PHP14P17  PHP14P11  TAI15017  TAI15003  PHP14P01  PHC08P23  PHC08C10  TAI15027  TAI15013  TAI15019  TAI15012  TAI15020  TAI15004  TAI15014  TAI15018  TAI15002  PHP14P06  PHP14P08  TAI15001  PHP14P15  PHC08C25  TAI15015  PHP14P21  TAI15011 TAI15005  PHC08P22  TAI15006  PHP14P13  PHP14P12  PHP14P19  PHP14P04  TAI15007  TAI15021  TAI15025  TAI15009  TAI15008  TAI15022  TAI15010  PHP14P26  PHP14P25  PHP14P27  TAI15026  PHP14P28  PHP14P24  PHP14P30  PHP14P29  TAI15024  TAI15029  TAI15030 TAI15028  PHP14P03  PHP14P18  TAI15016  TAI15023  PHP14P14  PHP14P09  PHP14P05  PHP14P10  PHP14P02  PHP14P17  PHP14P11  TAI15017  TAI15003  PHP14P01  PHC08P23  PHC08C10  TAI15027  TAI15013  TAI15019  TAI15012  TAI15020  TAI15004  TAI15014  TAI15018  TAI15002  PHP14P06  PHP14P08  TAI15001  PHP14P15  PHC08C25  TAI15015  PHP14P21  TAI15011

A. bicolor
Coancestry was investigated based on RAD loci with fineRADstructure [39]. Heatmap colors indicate numbers of RAD loci with estimated shared coancestry. Individuals are listed on both axes in the same order, clustered according to the tree shown on top of the heatmap [40]. Note that even though A. luzonensis appears to have more shared coancestry with the South China Sea population of A. marmorata than with other populations, introgression between A. luzonensis and the South China Sea population of A. marmorata is not supported by D and f 4 statistics (Supplementary Table 10). Figure 9: Ancestry painting for A. marmorata and A. megastoma.

Supplementary
Ancestry painting for 73 "core" A. marmorata individuals, 26 "core" A. megastoma individuals, and 20 recent hybrids between the two species. In addition, one A. marmorata individual (BOU15024) was included because it was initially assumed to be a hybrid based on morphological measurements (Supplementary Figure 3a); this assumption is not supported by the ancestry painting shown here. Horizontal bars indicate the genotypes at each of 302 sites fixed between the two parental species. White color indicates missing data. Heterozygous genotypes are shown with the top half in each bar matching the second parental species and vice versa. Light gray cells in the morphology column indicate individuals not classified into any of the "core" groups. The species' color code is identical to Supplementary Figure  Information shown here is a part of that presented in Supplementary Figure 9, focusing only on backcrossed hybrids and their genotypes on scaffolds with at least three fixed sites. Scaffold IDs are given on top and the positions of the first and last of the sites fixed on this scaffold are given below the ancestry painting. A single change from heterozygous to homozygous states or vice versa occurs six times among the seven individuals and two such changes on the same scaffold occur twice. With a mean distance of 1,708,024 bp and a maximum distance of 4,654,779 bp between the first and the last of the sites assessed on these 22 scaffolds, recombination breakpoints therefore appeared to be rare, in agreement with the interpretation of these individuals as backcrossed second-generation hybrids.
Supplementary Figure 11: Ancestry painting for A. marmorata and A. obscura.
Ancestry painting as in Supplementary Figure 9, but for 73 "core" A. marmorata individuals, 26 "core" A. obscura individuals, and 3 recent hybrids between the two species. Horizontal bars indicate the genotypes at each of 742 sites fixed between the two parental species.  Horizontal bars indicate the numbers of individuals collected at 14 sampling locations, counting only those with sufficient sequence quality that were used in genomic analyses. Hybrid frequencies are shown as black proportions of these bars.  The triangle plot illustrates the proportions of nuclear genomes derived from the maternal species and heterozygosity according to Pulido-Santacruz et al. [41]. Both the proportions of the nuclear genomes derived from the maternal species, f m,genome , and the heterozygosities were calculated from genotypes at fixed sites as described in the main text. Our interpretations of individuals as unadmixed, F1, or backcross with the maternal or paternal species are indicated. Morphological measurements followed Watanabe et al. [37]. Individuals identified as hybrids in Fig. 2 are marked with specimen IDs, excluding hybrids from Samoa (SAW) and American Samoa (SAA) and one hybrid from Vanuatu (VAG; VAG13087) for which the displayed measurements were not available (Supplementary Table 1       A. marmorata    A. marmorata (g = 12, µ = 8.6×10 -9 )

Supplementary
A. megastoma (g = 10, µ = 5.6×10 -9 ) A. obscura (g = 6, µ = 5.2×10 -9 ) Changes in effective population size (N e ; vertical axis) over time (the last 1 myr; horizontal axis) estimated using the pairwise sequential Markovian coalescent (PSMC). The PSMC was applied to WGS data of one individual for each of the three species A. marmorata, A. megastoma, and A. obscura. Estimates were based on assumed generation times (g) between 6 and 12 years and mutation rates (µ) between 5.2 and 8.6 × 10 −9 mutations/site/generation. Semi-transparent colored lines correspond to 100 bootstrap replicates. For visualization purposes, the range of contemporary N e values is truncated for A. megastoma; the maximum bootstrap value is N e = 2.0 × 10 6 . Note that the apparent bottleneck pattern seen in all three species could be an artifact as it is expected even without actual population-size decline when parts of the genome are affected by introgression [44].
Supplementary Figure 21: Genomic rearrangements supported by WGS data. Possible inversions and transpositions between eel genome assemblies are visualized as dot plots, in which each dot represents a tuple of nine nucleotides that are identical or the reverse complement between the two scaffolds on the x-and y-axes. Dot plots are shown for each comparison of eight different regions on scaffolds of the A. anguilla reference genome assembly [28] (columns; a-h) and the corresponding scaffolds in each of the four genome assemblies for A. japonica [21] (gray; top row), A. marmorata (cyan; second row from top), A. megastoma (purple; third row from top), and A. obscura (red; bottom row). The eight regions on A. anguilla scaffolds were identified as candidate regions for rearrangements through a combination of automated and manual approaches as described in Supplementary Notes 5-6 and Supplementary Table 11. In the absence of rearrangements, dots are expected to fall along a single diagonal line, whereas transpositions produce multiple lines with the same orientation, and inversions lead to a combination of lines with different orientations. However, errors in genome assemblies can also produce these very same patterns.
Below each dot plot, we show in light colors the depths of A. marmorata, A. megastoma, or A. obscura reads mapped to the A. anguilla scaffold; read data of A. japonica were not available. These depth distributions are scaled so that the top of the plot corresponds to 200 reads. The solid line in the same plot shows the distribution of reads without proper pairing of mates (e.g. both mates sharing the same orientation or being at an unexpected distance to each other). These reads without proper pairing are expected from genomic rearrangements, close to the boundaries of the rearrangements. Distributions of reads without proper pairing are scaled so that the top of the plot corresponds to 50 reads.
To the left of each dot plot, we also show overall read depths (again in light colors and scaled to a maximum of 200 reads) and distributions of reads without proper pairing (again as a solid line and scaled to a maximum of 50 reads); however, in these cases for reads mapped to scaffold of the A. marmorata, A. megastoma, or A. obscura assemblies.
True rearrangments between A. anguilla and A. marmorata, A. megastoma, or A. obscura are expected to produce peaks in the distributions of reads without proper pairing when A. marmorata, A. megastoma, or A. obscura reads are mapped to the A. anguilla assembly, but no such peaks when the same reads are mapped to the species-specific assembly. In contrast, errors in the assemblies for A. marmorata, A. megastoma, or A. obscura are expected to produce peaks in the distributions of reads without proper pairing when reads are mapped to the species-specific assembly.
Dashed horizontal lines demarcate contig boundaries on A. japonica, A. marmorata, A. megastoma, and A. obscura scaffolds. Gray rectangles indicate coding sequences on A. anguilla scaffolds according to the gene prediction with AUGUSTUS.
a-c) Potential rearrangements mapping to scaffolds scf0058 (a), scf0842 (b), and scf1091 (c) of the A. anguilla reference genome assembly [28]. The absence of reads without proper pairing on the A. obscura scaffold scf7180011634263, together with peaks of these reads on the A. anguilla scaffold scf0058 confirms the presence of an inversion between these two scaffolds (a).

Supplementary Tables
Supplementary Table 1: Sampled specimens. Underlined specimens were included in the "core" group of individuals for A. marmorata, A. megastoma, and A. obscura, based on morphological measurements characteristic for the species. Species assignment is based on mitochondrial sequence data, and additionally on morphological measurements when these were available. Solomon Islands sampling sites SOK (Kolombangara), SOL (Kolombangara), SON (Nggatokae), SOR (Ranongga), and SOV (Vangunu) are jointly labeled "SO" in Figure 1, Supplementary Figure 4a, and descriptions in the text. The availability of morphological information is indicated in the morphology column. Unless specified otherwise, all specimens for which morphology information was available were included in morphological principal component analysis. 1 specimen removed due to read number below 600,000; 2 specimen not included in morphological principal component analysis even though morphological information is available; 3 specimen removed due to percentage of mapped reads below 70%.  Table 2: Per-species population-genetic parameters for tropical eel species. Reported parameters are calculated for the dataset for phylogenetic analyses (before reducing it to maximally five individuals per species). Heterozygosity (h), nucleotide diversity (π), and population mutation rate (Θ) were calculated twice; first assuming that all missing data are invariable, and second assuming that missing data mask genotypes that are equally variable as the observed ones. Species   Note that for backcrossed hybrids, the identity of maternal and paternal species are ambiguous because either the mother or the father is a hybrid itself; in this case, the name listed as maternal species indicates the species identity of the mitochondrial genome. h fixed , heterozygosity at sites fixed between parental species; f m,genome , proportion of genome derived from the maternal species; f m,morphology , similarity to morphology of maternal species, relative to morphology of paternal species. The mitochondrial genome of A. marmorata and the f m,genome around 0.25 indicate that the mother of the mother of this individual was an A. marmorata but all other grantparents were A. megastoma. 2 The mitochondrial genome of A. marmorata and the f m,genome around 0.75 indicate that one of the grandparents of this individual, but not the mother of the mother, was an A. megastoma and all other grandparents were A. marmorata. 3 The f m,genome between 0.25 and 0.5 could indicate that one of the grandparents of VAG13087 was itself admixed.
However, as VAG13087 has a comparatively large proportion of missing data and its genotypes are available for only 125 of 302 fixed sites, we do not consider this evidence of earlier admixture reliable enough to warrant further discussion. 4 For comparison, we calculated h fixed and f m,genome for all A. marmorata, A. megastoma, and A. obscura individuals except "core" individuals (because these had been used to identify fixed sites), the identified hybrids listed above, and individuals with less than 2 million reads. After excluding these, 210 individuals remained for the analysis, of which 194 were mitochondrially assigned to A. marmorata, 12 to A. megastoma, and 4 to A. obscura.   Only a single quartet involving A. marmorata appeared significant in Supplementary Table 9 and was tested further with separate populations. In addition, the possibility of different degrees of introgression between A. luzonensis and the four A. marmorata populations was explored because A. luzonensis appeared to share more coancestry with the South China Sea population of A. marmorata than with other populations in the fineRADstructure analysis (Supplementary Figure 8) Evidence for possible genomic rearrangements from pairwise whole-genome alignment between the A. anguilla reference genome assembly [28] and either the A. japonica (jap) genome assembly [21] or one of the newly generated genome assemblies for A. marmorata (mar), A. megastoma (meg), and A. obscura (obs). Scaffold IDs and regions refer to the A. anguilla reference genome assembly. Possible inversions ("i") and transpositions ("t") were recorded when sequences in multiple alignment blocks between the A. anguilla assembly and another assembly showed different orientations or orders, respectively (see Supplementary Notes 5-6 for details). When a single alignment block or multiple closely spaced alignment blocks (with sequences in identical orientation and order) for the same two scaffolds spanned the whole region, the absence ("n") of a rearrangement was recorded. A question mark indicates insufficient support for inversions, transpositions, or their absence. Lower-case records ("i", "t", and "n") indicate support obtained with an automatic rearrangement identification pipeline while upper-case records in bold ("I", "T", and "N") indicate additional support from visual inspection of dot plots comparing pairs of scaffolds. The distance to the nearest coding sequence in the A. anguilla assembly, according to the gene prediction with AUGUSTUS [32] (see Supplementary Note 7), is given; a distance of 0 indicates that the region overlaps with a coding sequence. In cases where these distances are shorter than 2,000 bp, we used BLASTP [25] searches to compare the translated coding sequences to the zebrafish (Danio rerio) proteome (assembly version GRCz11; NCBI accession GCA 000002035.4 [33]) to identify homologous proteins. Protein IDs, e-values, and percentages of identical amino acids (p ident ) are reported for the best match in each BLASTP search. Underlined sets of records indicate genomic rearrangements that are supported by both the identification pipeline and the visual inspection as species-specific, affecting only one of A. marmorata, A. megastoma, and A. obscura. These nine rearrangements with comparatively strongest support are visualized in Supplementary Figure 21 (jointly in the case of two rearrangements identified on scaffold scf0058).