Ancient hybridization fuels rapid cichlid fish adaptive radiations

Understanding why some evolutionary lineages generate exceptionally high species diversity is an important goal in evolutionary biology. Haplochromine cichlid fishes of Africa's Lake Victoria region encompass >700 diverse species that all evolved in the last 150,000 years. How this ‘Lake Victoria Region Superflock' could evolve on such rapid timescales is an enduring question. Here, we demonstrate that hybridization between two divergent lineages facilitated this process by providing genetic variation that subsequently became recombined and sorted into many new species. Notably, the hybridization event generated exceptional allelic variation at an opsin gene known to be involved in adaptation and speciation. More generally, differentiation between new species is accentuated around variants that were fixed differences between the parental lineages, and that now appear in many new combinations in the radiation species. We conclude that hybridization between divergent lineages, when coincident with ecological opportunity, may facilitate rapid and extensive adaptive radiation.

Estimating very recent divergence times with relaxed molecular clock approaches is challenging due to non-linearity of the clock rate in the very recent time range (reviewed in Ho,et al. 82 ). Evidence for increasing clock rates towards the present has been accumulating in a wide range of taxa, where close to the present the molecular clock approaches the mutation rate, but moving backwards in time this rate declines exponentially, finally approximating the much lower substitution rate 82 . Generally, substitution rates are only reached after about one million years 82,83 a finding in agreement with cichlid mitochondrial markers 37 . No molecular clock inference approaches incorporating this recent time dependency yet exist.
Thus the calibration approach we have taken is useful to infer split times for divergence events more than one million years ago, but will result in an overestimate of recent divergence times. However, given that the node we aim to date (Congo-Nilotic divergence time) is in the time range where the molecular clock is constant and its age is similar to the age of the calibration nodes, we believe that this problem should not affect the estimate of the divergence time between the Congolese and the Upper Nile lineage. To get an indication of relative ages in the very recent part of the tree, we have added a clade endemic to a lake for which the geological age is well-constrained (Lake Nshere, 50,000 years) 37,84 .
The cyto-nuclear discordance of "Haplochromis" gracilior Thoracochromis pharyngalis and "Haplochromis" gracilior are sister species in the nuclear genome and they behave identically in the ABBA-BABA tests showing strong signals of having contributed genes to the admixed ancestry of all tested Lake Victoria Region radiations. Lake Kivu used to drain to the North, feeding Lake Edward and the upper Nile until the end of the Pleistocene, and only underwent flow reversal to feed into Lake Tanganyika and the Congo after major volcanic activity had blocked the northern outlet 2 .
So a sister species relationship between T. pharyngalis and H. gracilior makes sense biogeographically.
However, the mitochondrial haplotype of H. gracilior is distinct and deeply divergent both from T.
pharyngalis and the close relatives of the mitochondrial lineage of the latter, the taxa found in the Eastern Tanzanian Rivers. This finding suggests that H. gracilior and/or T. pharyngalis have a history of hybridization with mitochondrial capture from a third distinct lineage that has generated this cyto-nuclear discordance. There is no other species known that has a mitochondrial haplotype closely related to the one found in H. gracilior making further investigations of its evolutionary history difficult, though not impossible with whole-genome sequences 3 . Given that H. gracilior behaves identically to T. pharyngalis in all tests for introgression into the ancestor of all Lake Victoria Region radiations, the history of hybridization in H. gracilior does not affect the conclusions about the Congo-Nilotic hybridization event at the base of the LVRS.

LWS opsin haplotypes
The haplotypes found in the Congolese and Upper Nile lineage taxa, respectively, have basal positions in the two major LWS opsin haplotype clades found in the LVRS, often referred to as classes I and II 4 (Fig. 4, Supplementary Fig. 7). These haplotype classes differ in many amino acid substitutions, some of which have known effects on peak wavelength sensitivity. Our finding of ancestry informative intronic SNPs and a 6 bp long indel (Supplementary Data 3) rules out the possibility that strong parallel selection and sequence convergence could explain the similarity between the two major haplotype classes in the LVRS and the  178 in bovine rhodopsin) is expected to disrupt hydrogen bonds with nearby amino acids, which may lead to a shift in absorbance sensitivity 4 . Also the five additional non-synonymous substitutions that differ between Congolese and Upper Nile taxa, and are recombined in various ways in the LVRS, are all located in transmembrane helices, suggesting that some of them may cause functional differences between opsin genes too.

Sorting of ancestral alleles
The ancestral hybridization event that we infer here may have facilitated subsequent adaptive radiation by more generally providing both ecologically relevant genetic variation, variation in traits important to mate choice, and reproductive incompatibilities that could be sorted differentially among newly arising species.
We tested this hypothesis beyond the LWS gene by studying candidate loci for divergent selection or involved in reproductive isolation among species in the Lake Victoria radiation. We chose six species, sampled from exact sympatry, that represent the morphological and ecological diversity of the Lake Victoria radiation: Pundamilia pundamilia (blue male nuptial coloration, benthic insectivore-omnivore), P.
Of the 12,890 SNPs in these species, 340 (3%) were outliers of exceptionally high global F ST between the species (LV outiers, p<0.05, Fig 5a) and hence candidate loci for divergent selection or reproductive isolation. We identified both alleles present in the Congolese taxa at 1,053 of the 12,890 SNP sites, of which 83 (8%) were outliers in the Lake Victoria species (Fig. 5a, category 2). These alleles would also have been present as standing variation in the radiation ancestor without Upper Nile hybridization. However, 1,038 other sites had an alternative allele segregating in the Lake Victoria species that was only found in the Upper Nile taxa (Fig. 5a, category 3). Of these, 101 (10%) were Lake Victoria outlier loci. Even though some of these alleles may represent unsampled alleles also segregating in the Congolese lineage, these results indicate that the opportunity for divergence at a large proportion of the Lake Victoria outlier loci (101 of 340) was introduced through hybridization with the Upper Nile lineage.
Next, we tested whether sites fixed for alternative alleles in the Congolese and Upper Nile taxa were enriched for outlier loci among Lake Victoria species. This would be expected if sites fixed for alternative alleles in the parental lineages were more often involved in incompatibilities or were otherwise under stronger divergent selection than sites with alleles segregating within either of the parental lineages. We found 100 sites that were fixed for alternative alleles in Congolese and Upper Nile taxa but had both alleles present in Lake Victoria (Fig. 5a, category 4). Of these sites, 20% were significant high global F ST outliers among the six Lake Victoria species. As these hybridization-derived polymorphisms may have had a higher starting allele frequency compared to ancestral polymorphisms segregating at lower frequencies within the parental lineages, they may have been easier targets for diversifying selection. Just prior to the onset of the radiation in Lake Victoria we would estimate the minor allele frequency for these SNPs as ~16%,  Table 4). To test if the higher starting allele frequency in the Lake Victoria radiation alone could explain the enrichment for Lake Victoria outliers of sites fixed for alternative alleles in the Congolese and Upper Nile relatives of the radiation, we created a dataset of sites with comparable expected ancestral allele frequencies in the Lake Victoria radiation that were not fixed in the parental lineages. We approximated the ancestral allele frequency for the Lake Victoria radiation at each SNP as the weighted ancestral allele frequency, i.e. by multiplying the allele frequencies in the Congolese and Upper Nile taxa by 0.84 and 0.16, respectively. We then extracted all SNPs with a weighted minor allele frequency of 14-18%.
We obtained 179 sites including 18 Lake Victoria high global F ST outliers (Fig. 5a, category 5). The dataset of sites fixed for alternative alleles in the parental lineages contains a significantly higher proportion of Lake Victoria outliers compared to this dataset (0.2 vs 0.1, two-sided Fisher's exact test: p-value = 0.017). The same holds true if the test was repeated assuming an Upper Nile ancestry proportion of 10% to 30% (LV outlier proportions in all datasets ranging from 8-12%, data not shown).
Sites fixed for alternative alleles in the Congolese and Upper Nile lineage may be enriched for LV outliers because of some features of these genomic sites increasing their probability to become fixed in general (e.g. due to variation in mutation or recombination rates or purifying selection). To test for that, we checked if LV outliers fixed for alternative alleles in the parental lineages were also between other species more often fixed for alternative alleles than non-outliers. We used all species from outside the radiation  Figure 4b shows that the high global F ST at these sites is not driven by a single species that differs at all sites from the others or by species differences in overall Congolese and Upper Nile ancestry proportions, but rather that at each site each species (almost) fixed one or the other parental allele in a mosaic fashion. Nodes with a posterior probability of less than 70 are collapsed. Horizontal bars show the 95% highest posterior density (HPD) intervals around the mean ages. As phylogenetic tree inference methods assumes a time-constant clock, whereas in fact the clock rate decays exponentially up to one million years 10-12 , ages of divergence events younger than 1 Ma are strongly overestimated and cannot be read from our graph (indicated by grey rectangles over the time scales). Note, that in discordance with the nuclear phylogeny, "Haplochromis" gracilior of the Upper Nile lineage is highly divergent from the other Upper Nile lineage taxon, Thoracochromis pharyngalis, whereas T. pharyngalis is placed sister to Astatotilapia cf. paludinosa consistent with the nuclear phylogeny (see Supplementary Discussion).(b-e) Paleogeographic maps based on previously published data and reviews 1,2,[13][14][15][16][17][18][19][20] . Ellipses show the potential distribution range of the Congolese lineage, the Upper Nile lineage, and their common ancestor in red, blue and light grey, respectively. (b) In the early Pliocene, 4 Ma, Paleolake Obweruka had formed at the location of modern lakes Albert and Edward 15 . Like Lake Tanganyika, which also existed at this time, this paleolake drained into the Congo. (c) In the early Pleistocene, by 2 Ma, rifting and flank uplift along the Western branch of the East African Rift System had truncated the paleoriver network, separating the Lake Victoria Region from the Congo drainage system 13,18 and the Rwenzori Massif uplift had split paleo-Lake Obweruka into two smaller lakes 15 . About 400 thousand years ago (ka, time poorly constrained) regional tilting and further uplifting west of the present Lake Victoria lead to reversal of the flow direction of the Katonga (Kat.) and the Kagera (Kag.) Rivers, causing the filling of the Victoria basin, forming proto-Lake Victoria 21 which flowed over into the Nile 14 . Until today, the Lake Victoria region contains the headwaters of the Nile. During the time period between 145-15 ka, East Africa experienced several cycles of extremely arid and more humid periods 16,19 . In at least one of the more humid periods (e.g. ~145-120 ka 16,19,20 (d)), the Lake Victoria Region (Nile drainage system) was likely reconnected with the  (Seehausen et al., 2002). (e) Since the eruption of the Virunga volcanoes (grey triangles), around 11-25 ka 1,2 , the outflow of Lake Kivu to the North (Lake Edward) is blocked and the lake now drains southward through the Rusizi river into Lake Tanganyika, and thus into the Congo drainage system. Lake Victoria is connected to Lake Albert via the Victoria Nile, but the Murchison Falls prevent upstream fish passage. Lakes Albert and Edward are connected by the Semliki River, but rapids pose a barrier to upstream fish movement.  Table 4. (c) F4-ratio test formula to calculate the Upper Nile ancestry proportion from allele frequency (p) differences over n sites. The F4-ratio test is based on correlations between allele frequency differences between two pairs of populations. Allele frequency differences between two pairs of populations will be correlated if one of the populations shares a drift path (common ancestry) with a population of the other pair. As an example, if we take the genealogy of populations (((A, B), C), O) shown in panel a, the allele frequency differences between A and O would be correlated with the allele frequency differences between B and C because of the shared drift path between A and B (yellow-black dashed line). If e.g. at a given site the ancestral minor allele frequency at the root was low but increased through drift along the yellow-dashed branch, both A and B would have increased allele frequencies as they both share this yellow-dashed branch. Therefore, both the allele frequency differences between A and O and between B and C would increase. The magnitude of correlation between these two allele frequency differences is proportional to the shared drift path. If population B was of hybrid origin, it would only partially share the yellow-dashed drift path with A and the correlation between the allele frequency differences would weaken to the extent of allelic contribution of the other parental lineage. Here, the Wami River cichlids and "Haplochromis" gracilior share the branch dashed in yellow which leads to correlated allele frequency differences between the Wami River sp. and Astatotilapia flavijosephi (grey) and between A. sp. "Yaekama" and H. gracilior (green). This can be seen by the overlap of the grey and green drift paths along the yellow-dashed branch in (a). If an LVRS group was completely of Upper Nile origin (ancestor was a sister group of H. gracilior), all allele frequency differences between the LVRS group and A. sp. "Yaekama" (purple) would be due to drift along the drift paths indicated with solid purple arrows. The overlap of the drift paths between the grey and purple arrows would be equal to that of the grey and green arrows (at the yellow branch) leading to a ratio of f4 statistics of α = 1. If the LVRS group was completely derived from a Congolese lineage sister to A. sp. "Yaekama", the allele frequency differences between LVRS and A.      The topology on top left shows the naming scheme of the tested populations. Most bi-allelic sites will show allele sharing patterns in concordance with this tree (e.g. P1 + P2 share a derived allele (B), whereas the other taxa share the ancestral allele (A) with the outgroup (BBAAA)). Due to incomplete lineage sorting or gene flow, some sites will show discordant allele sharing patterns such as those illustrated with the topologies on the right. "A" denotes the allele found in the outgroup (putative ancestral allele, black branches in the topologies) and "B" denotes the alternative allele (grey branches). For the five-population test,