Genome-culture coevolution promotes rapid divergence in the killer whale

The interaction between ecology, culture and genome evolution remains poorly understood. Analysing population genomic data from killer whale ecotypes, which we estimate have globally radiated within less than 250,000 years, we show that genetic structuring including the segregation of potentially functional alleles is associated with socially inherited ecological niche. Reconstruction of ancestral demographic history revealed bottlenecks during founder events, likely promoting ecological divergence and genetic drift resulting in a wide range of genome-wide differentiation between pairs of allopatric and sympatric ecotypes. Functional enrichment analyses provided evidence for regional genomic divergence associated with habitat, dietary preferences and postzygotic reproductive isolation. Our findings are consistent with expansion of small founder groups into novel niches by an initial plastic behavioural response, perpetuated by social learning imposing an altered natural selection regime. The study constitutes an important step toward an understanding of the complex interaction between demographic history, culture, ecological adaptation and evolution at the genomic level.

The interaction between ecology, culture and genome evolution remains poorly understood. Analysing population genomic data from killer whale ecotypes, which we estimate have globally radiated within less than 250,000 years, we show that genetic structuring including the segregation of potentially functional alleles is associated with socially inherited ecological niche. Reconstruction of ancestral demographic history revealed bottlenecks during founder events, likely promoting ecological divergence and genetic drift resulting in a wide range of genome-wide differentiation between pairs of allopatric and sympatric ecotypes. The interplay between ecology, culture and evolution at the level of the genome remains poorly understood 1 . The ability to adapt to novel ecological conditions through behavioural plasticity is thought to be able to buffer natural selection pressures and promote rapid colonisation of novel niches 1 . However, by perpetuating exposure to a novel environment, stable cultural transmission of behaviour can also provide an opportunity for natural selection to act on adaptive genomic variation.
Examples of genomic adaptation in humans during the period of recent ecological and cultural diversification and consequent demographic expansion are well illustrated [1][2][3] .
For example, the Inuit of Greenland descend from a small founder population that split from an East Asian source population and successfully colonised the extreme climatic conditions of the Arctic environment through culturally transmitted methods of hunting marine mammals and genetic adaptation to a cold climate and hypoglycemic lipid-rich diet 4,5 . However, our understanding of the complex interaction between ecology, culture, adaptation and reproductive isolation at a genome-wide level have long suffered from deficiency of genome-wide data, and conceptually, from the almost exclusive focus on these processes in humans and thus a lack of comparative data from other species 6 .
Killer whales (Orcinus orca) are the largest species in the dolphin family (Delphinidae) and together with humans, are one of the most cosmopolitan mammals, being found in all ocean basins and distributed from the Antarctic to the Arctic 7 . This top marine predator consumes a diverse range of prey species, including birds, fish, mammals and reptiles 7 . However, in several locations killer whales have evolved into specialized ecotypes with hunting strategies adapted to exploit narrow ecological niches [8][9][10][11][12][13][14][15] . For example in the North Pacific, two sympatric ecotypes co-exist in coastal waters: the mammal-eating (so-called 'transient') ecotype and fish-eating (socalled 'resident') ecotype [8][9][10][11] . This ecotypic variation is stable among multiple subpopulations of the transient and resident ecotypes across the North Pacific 8-11 that diverged from common ancestral matrilines approximately 68 and 35 KYA respectively 16 . A highly stable matrilineal group structure and a long post-menopausal lifespan in killer whales is thought to facilitate the transfer of ecological and social knowledge from matriarchs to their kin 17 and thereby perpetuate the stability of ecotypic variation in killer whales 18 . In the absence of a definitional consensus and for the purposes of investigating how cultural phenomena interact with genes, culture has been broadly defined as 'information that is capable of affecting individuals' behaviour, which they acquire from other individuals through teaching, imitation and other forms of social learning' 1 . Several studies have argued that behavioural differences among killer whale ecotypes are examples of culture in this broader sense of the term 18,19 . However, this behavioural variation among ecotypes likely results from ecological, genetic and cultural variation and the interaction between them, rather than a single process explaining all behavioural variance 20 . Killer whales therefore offer a prime example of how behavioural innovation perpetuated by cultural transmission may have enabled access to novel ecological conditions with altered selection regimes, and thus provide an excellent study system for understanding the interaction between ecological and behavioural variation, and genome-level evolution.

Results and Discussion
Whole genome sequencing. We generated whole genome re-sequencing data of 48 individuals at low coverage and accessed high coverage sequencing data from two further individuals 21,22 to investigate patterns of genomic variation among killer whale ecotypes. The samples represent five distinct ecotypes that, based on phylogenetic analysis of mitochondrial genomes 16 , include among the oldest and youngest divergences within the species (Fig. 1). The dataset incorporated 10 individuals each of the transient and resident ecotypes that occur in sympatry in the North Pacific; and from Antarctic waters, 7 individuals of a large mammal-eating form (type B1), 11 individuals of a partially sympatric, smaller form which feeds on penguins (type B2), and 10 individuals of the smallest form of killer whale, which feeds on fish (type C) ( Fig. 1). A total of 2,577 million reads uniquely mapped to the 2.4-Gbp killer whale reference genome 21 ( Supplementary Fig. 1) for which a chromosomal assembly was generated for this study, so that approximately 50% of the autosomal regions of each individual were sequenced at ≥2× coverage (Supplementary Table 1). Subsequent data analyses used methods that account for uncertainty in the assignments of genotypes, enabling accurate inferences to be drawn from low-pass next-generation sequencing data 23,24 . Comparisons of estimated population genomic metrics such as genome-wide and per-site F ST indicated that estimates from our low coverage data were highly consistent with published high coverage RAD-seq 25 Table 3) and based on the 95% highest posterior density interval (HPDI) of the mutation rate estimate 26 . This equates to approximately 4,900−8,800 generations and indicates a rapid diversification, over a timescale comparable to the diversification of modern humans 27 . We caution that for these age estimates we rely on mutation rate estimates derived from interspecific comparisons among odontocetes, and that therefore these estimates of TMRCA are at best approximate.
Further, the demographic history and any gene flow between ecotypes will have an influence on the sharing of derived mutations and hence this estimate. However, we do expect that the estimate will be within the correct order of magnitude. Our estimated TMCRA overlaps with a recent RAD marker study 28 , which estimated a TMRCA of 189 KYA (only a point estimate was reported by these authors) scaling by a mutation rate 1.21 times higher than we have used here. The estimated TMRCA of a global dataset of killer whales based on non-recombining mitochondrial genomes has been estimated at 220-530 KYA 16 , older than our estimate based on nuclear genomes.
Recombination of the nuclear genome among killer whale lineages has therefore continued after matrilineal lineages have diverged.
Genetic differentiation and divergence. Despite this recent shared ancestry, substantial genome-wide differentiation and divergence had accrued between all pairs of ecotypes included in this study (Fig. 2 Fig. 3). Similarly, pairwise relatedness due to identity-by-descent (IBD), i.e. genetic identity due to a recent common ancestor was high within each ecotype. While the three Antarctic ecotypes still showed signs of recent relatedness, no shared recent IBD ancestry was detected between Antarctic and Pacific types or between the sympatric resident and transient ecotypes ( Supplementary Fig. 4). The greatest differentiation (Supplementary Table 2) as visualised in the maximum likelihood graph (Fig. 2a) and PCA plot (Fig. 2b) was between the allopatric Pacific and the Antarctic ecotypes, whilst differentiation among Antarctic ecotypes was much lower than between the two Pacific ecotypes. Thus, our sampled populations allowed us to investigate the accrual of genomic differentiation along points of a continuum, acting as a proxy of sampling at different stages of the speciation process 30 . The accrual of genome-wide differentiation (F ST = 0.09) between even the most recently diverged and partially sympatric ecotypes (Antarctic types B1 and B2) indicates that reproductive isolation quickly becomes established after the formation of new ecotypes. Thus whole-genome resolution confirms that even in sympatry, contemporary gene flow occurs almost exclusively among individuals of the same ecotype, allowing genomic differentiation to build up between ecotypes so that within an ocean basin, ecological variation better predicted genetic structuring than geography. The genomes of transients are therefore partly derived from at least one population related to the Atlantic and resident ecotype, (but not necessarily these populations, i.e. the source could be an un-sampled 'ghost' population). The asymmetrical 2D-site frequency spectrum (SFS) also implies directional gene flow from a population ancestrally related to the residents into the transient ecotype 35 (Fig. 2e). Given the extent of the inferred demographic bottlenecks during founder events (see section below), and the expected consequential shift in the SFS 36,37 , it seems likely that this admixture would have occurred during or after the founder bottleneck in the transient ecotype, otherwise it would be expected to be less correlated with the resident SFS.
The sequencing of more populations is expected to shed further light on this episode of ancient admixture. Previous studies have inferred on-going gene-flow between the resident and transient ecotypes 38,39 , however, the outcome of the suite of analyses applied here strongly imply reproductive isolation between present-day populations ( Fig. 2).
TreeMix, the three-population and D-statistic tests inferred that type B1 is admixed and derives from at least two populations related to both types B2 and C (Fig. 2a (Fig. 3a). The inference of the timing of these demographic declines is dependent upon the assumed mutation rate (µ), but across a range of sensible estimates of µ, the declines broadly fall within the Late Pleistocene 22 . This was previously interpreted as evidence for demographically independent population declines in each ocean, driven by environmental change during the Weichselian glacial period 22 . However, this inference assumes that each PSMC plot tracks the demographic history of a single unstructured panmictic population 41 . The y-axis of the PSMC plot is an estimation of N e derived from the rate of coalescence between the two chromosomes of a diploid genome. However, in the presence of population structuring, regions of the two chromosomes will coalesce less frequently as their ancestry may derive from different demes or sub-populations; the rate of coalescence can thus be similar to a single population with large N e . PSMC estimates of N e during population splits can therefore be greater than the sum of the effective sizes of the sub-populations, dependent on the number of sub-populations and the degree of connectivity (i.e. cross-coalescence) between them 27,41 (see Figure S5 of ref 27). In fact, the results presented here and previously published 16,25,28 indicate that throughout the Weichselian glacial period there were multiple population splits, both between and within ecotypes, including the splitting of the two lineages included in the PSMC analyses just at the point of the change in inferred N e ( Fig. 3a; Supplementary Fig 11).
Therefore, the PSMC plots will be strongly influenced by these changes in structure, and the changes in inferred N e are not necessarily associated with population declines (see Supplementary Fig. 12).
To investigate the influence of connectivity on our PSMC plots further, we inferred ancestral N e of the diploid X-chromosome (N eX ) of the Atlantic female and directly compared to the autosomal fraction (N eA ) of the genome presented in figure 3a. From around 300,000 to 130,000 years BP during the Saalian glacial period, the inferred N eX ranged from 0.58-0.79 of the N eA (Fig. 3b), which lies within expectation for demographically stable mammalian species 42 . During the first part of the Weichselian glacial period, N eX markedly declined, reaching a minimum at approximately 30,000-50,000 years BP. The timing of this bottleneck in N eX overlaps with the stem age of the mitochondrial clade for this Atlantic population 16 , i.e. consistent with almost all mitochondrial diversity being lost in this lineage during this period. Conversely, this is concurrent with the peak estimate of autosomal N eA inferred by PSMC, and ratio of N eX /N eA falls to approximately 0.3 at this N eX minimum and then recovers within 1,000 generations to >0.75 (Fig. 3b). Simulated demographic bottlenecks of several hundred-fold reduction result in a disproportionate loss of N eX , attributed to the difference in inheritance mode of each marker, and the ratio of N eX /N eA can reach less than 0.3 42 . Following the bottleneck, N eX recovers more rapidly than N eA and the ratio of N eX /N eA can exceed 0.75 during this recovery phase 42 . The concordance of the timing of the bottleneck in the X-chromosome and the stem age of the mitochondrial genome suggest an underlying demographic process, rather than strong selection on the X-chromosome or some other factor driving mutation rate 42 . A sex-biased process such as primarily male-mediated gene flow between demes could further influence the ratio of N eX /N eA 43 .
To estimate the demographic histories from our population genomic data we produced 'stairway' plots using composite likelihood estimations of theta (θ) for different SNP frequency spectra associated with different epochs, which are then scaled by the mutation rate to estimate N e for each epoch 44 . Using this method we reconstructed a demographic history from our population genomic data for the resident ecotype that was comparable to the PSMC plot from a single resident high coverage genome, both methods identifying a decline in N e starting at approximately 60 KYA (Fig. 3a,c). The stairway plots for the transient ecotype and type C showed the same pattern as the resident, of a decline to a bottlenecked population with an N e of <1,000, and a subsequent expansion (Fig. 3c). The bottlenecks did not occur simultaneously, as might be expected in response to a global environmental stressor during a glacial cycle, but instead were sequential (Fig. 3c). In each case, the timing of the demographic bottleneck overlapped with the previously estimated timing of the stem age of the mitochondrial genome clades containing each ecotype 16  ranging from a few tens to hundreds, a genome-wide contribution of ecologicallymediated divergent selection is neither necessary, nor particularly likely to explain the observed shifts in allele frequencies in such a large number of loci. Consistent with this prediction, we find that differentiation is highest along the branches inferred by TreeMix to have experienced the most substantial genetic drift (Fig. 2a), i.e. the branch to the ancestor of the Antarctic types and the branch to the resident ecotype ( Supplementary Fig. 13). We therefore expect that only those beneficial alleles that have a strong favourable effect (i.e. strength of selection (s) > 1/2Ne) would have an increased fixation probability due to selection within these founder populations.
Much of the heterogeneity in the differentiation landscape was shared among pairwise comparisons (Fig. 4c). Since diversity (π) was not associated with mutation rate µ, as inferred from neutral substitution rate dS, (r = -0.1 to -0.24), the observed co-variation in differentiation (F ST ), diversity (π) and absolute divergence (D xy ) between population pairs ( regeneration is thought to be constrained in killer whales whilst inhabiting the cold waters around Antarctica due to the high cost of heat loss, and is thought to underlie rapid round-trip movements to warmer subtropical waters by Antarctic ecotypes 56 .
The balance between skin regeneration and thermal regulation in Antarctic waters could be a major selective force requiring both behavioural 56  SNPs in that study 25 . Enrichment of these GO-terms was largely driven by differentiation in a single exon in the GATA4 gene, which included a fixed nonsynonymous substitution, sequenced in both studies. Nonetheless this overlap highlights that reduced representation libraries can provide useful genome-wide inference of candidate targets of selection.
Signatures of selection along branches leading to the two predominantly mammaleating ecotypes included in this study, the North Pacific transient and Antarctic type B1, were found in genes that play a key role in the methionine cycle (Fig. 5).
Methionine is an essential amino acid that has to be obtained through dietary intake, and is converted through trans-sulfurcation to cysteine via intermediate steps of catalysis to homocysteine 58 . Any excess homocysteine is re-methylated to methionine 58 . Diets with different protein content, such as between killer whale ecotypes, will differ in their content of methionine, and the enzymatic cofactors involved in the metabolism of methionine and homocysteine, which include folate, vitamins B6 and B12 59 (hence why vegetarians often take vitamin B12 supplements).
Whilst different genes and different biological processes showed a signature of selection in each of these two mammal-eating ecotypes ( Conclusions. Overall, our results indicate that the processes underlying genomic divergence among killer whale ecotypes reflect those described in humans in several respects. Behavioural adaptation has facilitated the colonization of novel habitats and ecological niches. Founder effects and rapid formation of reproductive isolation, followed by population expansion have promoted genome-wide shifts in the frequency of alternative alleles in different ecotypes due to genetic drift. Demographic changes during founder events and subsequent expansions can also influence cultural diversity 61,62 , and may have had a role in reducing within-ecotype cultural diversity and promoting cultural differentiation between ecotypes. As with studies on modern humans, it is difficult to demonstrate a causal association between cultural differences and selection on specific genes 1 , but our findings of divergence in genes with putative functional association with diet, climate and reproductive isolation broadly imply an interaction between genetically and culturally heritable evolutionary changes in killer whale ecotypes. Given these findings, the almost exclusive focus on humans by studies of the interaction of culture and genes 6 should be expanded, and exploration of culture-genome coevolution models in suitable non-human animal systems encouraged.

Sample collection.
Skin biopsies from free-ranging killer whales were collected using projected biopsy darts 63  For illustrative purposes (e.g. Manhattan plots) a synteny based chromosomal assembly of the reference genome was produced by aligning the killer whale scaffolds to a chromosomal assembly of the cow Bos taurus (Btau_4.6.1) genome using the Satsuma aligner 72 with default settings. Synteny was conserved and showed no largescale inter-nor intra-chromosomal rearrangements in any scaffolds ( Supplementary   Fig. 1). Inferences based on outlier peaks were not influenced by this superscaffolding process.
Filtered reads were further mapped to a reference mitochondrial genome (GU187176.1) and compared with previously published mitogenome sequences from these individuals 16,73 The assembled mitogenome sequences were a 100% match with those previously generated for these individuals using targeted sequencing approaches 17,73 . As previously reported 16,73 , the mitogenomes of each ecotype clustered in strongly supported mitochondrial DNA clades, with the exception that one type B1 individual sampled at a different geographic location to the other type B1 individuals, had a highly divergent mitogenome haplotype 16 . The relationships among these mitochondrial lineages can be seen in a cladogram of a maximum-likelihood phylogenetic reconstruction ( Supplementary Fig. 3a) generated as per reference 74.   Table 3). The results were highly consistent between individuals of the same ecotype (Supplementary Table 3).
The mean rate of nucleotide evolution estimated for odontocetes of 9.10 × 10 -10 substitutions/site/year (95% HPDI: 6.68 × 10 -10 , 1.18 × 10 -9 ) 26  Whereas all other admixture methods base their inference on called genotypes and implicitly assume that the genotypes are called without error, NGSadmix bases its inference on genotype likelihoods (GLs) and in doing so it takes into account the uncertainty in the called genotypes that is inherently present in low-depth sequencing data 23,24 . The method has been demonstrated, using simulations and publicly available sequencing data, to have great accuracy even for very low-depth data of less than 2fold mean depth 29 . GLs were estimated using the SAMtools method 71 80 . Briefly, the covariance matrix between individuals, computed as proposed in reference 81 , is approximated by weighting each genotype for its posterior probability, the latter computed using the ANGSD software package 78

as described in
Nielsen et al. 23 . The eigenvectors from the covariance matrix were generated with the R function 'eigen' and significance was determined with a Tracy-Widom test to evaluate the statistical significance of each principal component identified by the PCA. The PCA was plotted using an in-house R script (available at https://github.com/mfumagalli/ ngsPopGen/ tree/master/scripts).

Distance-based phylogenetic inference.
We generated 100 matrices of pairwise genetic distances, in which pairwise genetics distances were calculated using ngsDist 82 , which takes genotype uncertainty into account by avoiding genotype calling and instead using genotype posterior probabilities estimated by ANGSD. A block-bootstrapping procedure was used to generate 100 distance matrices, obtained by repetitively sampling blocks of the original data set (Supplementary Table 10).
Pairwise genetic distances among the 48 low coverage genomes and high coverage resident genome were visualised as a phylogenetic tree using the distance-based phylogeny inference program FastME 2.0 83 (Supplementary Fig. 3b).

Pairwise relatedness estimation.
We estimated pairwise relatedness due to identityby-descent (IBD), i.e. genetic identity due to a recent common ancestor, of every possible combination of two individuals using NgsRelate 84 . NgsRelate provides ML estimates of R, where R = (k 0 ,k 1 ,k 2 ) and k m is the fraction of genome in which the two individuals share m alleles IBD. NgsRelate provides maximum likelihood estimates of R by finding the value of R that maximizes this likelihood function with an Expectation Maximization algorithm using genotype likelihoods instead of genotypes to account for the inherent uncertainty of the genotypes. NgsRelate has been shown using simulations and real data to provide robust estimates for low-depth NGS data (as low as 1-2×), which are markedly better than genotype-based methods 84 . The full results are reported in Supplementary Table 11, and a summary is presented in Supplementary Fig. 4.

Genetic differentiation (F ST ), divergence (Dxy) and nucleotide diversity (π).
Measures of genetic differentiation, divergence and diversity were estimated using methods specifically designed for low-coverage sequencing data. Using a Maximum Likelihood-based approach previously proposed 23  from RAD-seq data generated in a previous study 25 Fig. 2a). We then performed 1,000 replicates, randomly sub-sampling with replacement ten individuals from each ecotype. The correlation between F ST estimates from these random sub-samples ranged from 0.6861 to 0.9372. We found the correlations between estimates of F ST from the RAD-seq data and those from our WGS data ranged from 0.5475 to 0.7140 ( Supplementary Fig. 2b). The significant correlation in estimates of F ST between two different methods using different individuals suggests that these estimates are reliable.
The average number of nucleotide substitutions Dxy 88 Table 12). Estimates of nucleotide diversity can be influenced by differences in sequencing coverage and sequencing error.
However, it has been shown that using an empirical Bayes approach, implemented in ANGSD 78 , the uncertainties in low-depth data can be overcome to obtain estimates that are similar to those obtained from datasets in which the genotypes are known with certainty 89 . Multiple checks were performed to ensure that estimates of π were not an artefact of the data filtering. Comparable estimates of π were obtained using the method implemented in ANGSD for a single 20× coverage 'resident' genome (π = 0.0009) when it was randomly down-sampled to 2× coverage (π = 0.0008).
Nucleotide diversity was estimated for sites covered by at least 5 individuals in each population in windows of size 50, 100 and 200-kb (Supplementary Table 12). PSMC plots of genomes with <20× coverage have been shown not to be directly comparable to higher coverage genomes without first applying a correction for an appropriate false negative rate of detecting heterozygotes 90 . Additionally, the effect of mapping short read data to a reference genome comprised of short contigs may also be problematic, especially when many contigs fall below ~50-kb (the typical size of shared fragment size from 1,000 generations ago in humans), although to the best of our knowledge, the effect of mapping to different quality reference assemblies on PSMC analysis has not been tested to date. Moura et al. 22 Fig. 9) even though the two individuals shared a relatively recent common ancestor (TMRCA estimated at ~150 KYA by the same authors 28 ). In order to better understand if these methodological issues (<20× coverage sequence data mapped to a highly fragmented reference) led to erroneous inference of the demographic histories and the underlying processes in this previous study 22 , PSMC was used to analyse down-sampled versions of the high coverage North Atlantic killer whale genome.
A 50× bam file was produced by mapping the short read data generated from the North Atlantic killer whale to scaffolds of the autosomal regions of the high quality killer whale reference genome that were greater than 10-Mb in length, totaling 1.5 Gb and which had all repeat regions masked as noted above. The 50× coverage bam file was then down-sampled to produce 13× and 20× coverage bam files. An additional 20× bam file was produced for a North Pacific killer whale using data from the short read archive (SRP035610) 22  in N e , consistent with findings by a previous study 90 . In particular, estimates of N e in more recent times based on the 13× genome assembly differed markedly to inferred Ne from the 20× and 50× genome assemblies (Supplementary Fig. 10). This is a consequence of a higher false negative detection rate of heterozygote sites in the 13× genome assembly, producing the same effect as a smaller mutation rate would have on the plot. The PSMC plots of the 20× and 50× coverage North Atlantic genome were almost identical both regarding the timing and the magnitude of demographic events. Applying an additional correction for a uniform false detection rate of 2% further increased the similarity of the two plots and suggested that by applying this correction the 20× North Pacific killer whale genome could be directly compared with the 50× North Atlantic killer whale genome using PSMC.
All North Pacific resident lineages (estimated as above) and presented in Fig. 3a. Lastly, we simulated data consistent with a population split using ms 91 and plotted the changes in effective population size inferred by PSMC ( Supplementary Fig. 12).
A final PSMC plot of the autosomal regions of the North Atlantic female killer whale at a coverage of 50× was scaled to the autosomal mutation rate (µ A ) of 2.34×10 -8 substitutions/nucleotide/generation 26 , as used in our estimation of TMRCA (see above) and compared with a plot of the diploid X-chromosome, scaled to real time as per reference 27 in which the neutral mutation rate of the X-chromosome was derived as µX=µA [2(2+α)]/ [3(1+α)], assuming a ratio of male-to-female mutation rate of α = 2 92 . This gave us an estimated µX = 2.08×10 -8 substitutions/nucleotide/generation.
The plot is presented in Fig. 3b. We also re-estimated the PSMC plot for the Xchromosome using different mutation rates to investigate which rate would produce PSMC plots with inferred concurrent declines in N e in autosomes and X-chromosome.
We found that µX = 1.00×10 -8 substitutions/nucleotide/generation would be needed to synchronise the inferred demographic changes in these two markers ( Supplementary   Fig. 17). This would require the male-to-female mutation rate (α) to be orders of magnitude higher, making it seemingly biologically unrealistic.
To reconstruct ancestral demography in the ecotypes for which we did not have a high coverage genome, we applied a method that uses the site frequency spectrum from population genomic data to infer ancestral population size changes 44  Holocene. It should be kept in mind that the PSMC plot is based on data from a single individual and so will track declines in Ne due to further founder effects as the resident ecotype continues to split into multiple discrete populations. In contrast, the Stairway Plot is based on population genomic data and will track the change in Ne across all the sampled resident populations after they have split.
The method was then applied to the site frequency spectra of the transient ecotype and type C. The results are shown in Fig. 3c, and in each case the Stairway Plot infers a sudden and dramatic demographic decline, consistent with previously inferred population split times followed by a demographic expansion. The timing of the decline of the Antarctic types overlapped due to the recent shared ancestry and so only the plot for type C is shown in Fig. 3c for clarity.
Inferring putatively functional allele shifts due to selection. Shifts in allele frequencies can occur due to selection, but differences in allele frequencies can also accumulate between populations due to drift 45,46        respectively, and to form S-adenosylhomocysteine (SAH) 99 . SAH is converted to homocysteine, which is then catalysed by cystathionine β-synthase, an enzyme encoded by the CBS gene to cystathionine, which is then converted to cysteine 99