The ash dieback invasion of Europe was founded by two genetically divergent individuals

Accelerating international trade and climate change make pathogen spread an increasing concern. Hymenoscyphus fraxineus, the causal agent of ash dieback, is a fungal pathogen that has been moving across continents and hosts from Asian to European ash. Most European common ash trees (Fraxinus excelsior) are highly susceptible to H. fraxineus, although a minority (~5%) have partial resistance to dieback. Here, we assemble and annotate a H. fraxineus draft genome which approaches chromosome scale. Pathogen genetic diversity across Europe and in Japan, reveals a strong bottleneck in Europe, though a signal of adaptive diversity remains in key host interaction genes. We find that the European population was founded by two divergent haploid individuals. Divergence between these haplotypes represents the ancestral polymorphism within a large source population. Subsequent introduction from this source would greatly increase adaptive potential of the pathogen. Thus, further introgression of H. fraxineus into Europe represents a potential threat and Europe-wide biological security measures are needed to manage this disease.

H uman movement and climate change may increase the incidence of plant pathogen introductions as new environments become favourable 1 . Dutch elm disease is one example of a pathogen epidemic that emerged and caused the loss of billions of elm trees globally 2 . Presently, the ascomycete fungus Hymenoscyphus fraxineus is causing disease and death in European common ash, Fraxinus excelsior, and narrow-leaved ash, F. angustifolia. H. fraxineus jumped host from Asian ash species where it is a leaf pathogen with little impact on its host. In Europe it is killing ash at an alarming rate and displacing the non-aggressive indigenous fungus, H. albidus 3 .
The disease was first observed in Europe in north-western Poland in 1992 and, moving west, was identified in the UK in 2012 3 . Less than 5% of trees are partially resistant or tolerant 4,5 to ash dieback disease, which is characterized by dark brown/orange lesions on leaves followed by wilting, necrotic lesions on shoots, then diamond-shaped lesions on the stems and finally dieback of the crown 6 ( Fig. 1). The loss of leaves in the crown of mature trees proceeds over years and leads, in severe cases, to tree death. H. fraxineus is a heterothallic fungus 7 that has been shown to reproduce asexually in vitro 8 , but in the wild, sexual reproduction occurs annually on fallen leaf rachises in the leaf litter 6 (Fig. 1).
A pathogen's evolutionary potential is rooted in its genetic diversity (or its effective population size) 9 . All else being equal, large populations can adapt more quickly than small ones for two reasons; first, a large population carries more polymorphism and so there is a greater chance that a favourable mutation is segregating. Second, the impact of random genetic drift is lower for larger populations and therefore, natural selection is better able to drive those favourable mutations to fixation 10 . However, pathogen introductions are often associated with genetic bottlenecks, or founder effects, which reduce the level of genetic diversity and the efficacy of selection. This disparity between reduced polymorphism and invasion success is known as the genetic paradox of biological invasions 11 . The potato late blight pathogen, Phytophthora infestans, is another example of a successful bottlenecked pathogen introduction 12 . However devastating a pathogen may be, multiple introductions can increase diversity above that of native populations 11 . Early P. infestans invasions were dominated by a single clonal lineage (US-1), which was later superseded as diversity and severity increased 12 . The Dutch elm disease pandemic(s) also had an initial rapid spread of Ophiostoma ulmi which was later replaced by O. novo-ulmi (also subspecies americana), and went on to devastate elm populations across two continents 2 . The amount of H. fraxineus genetic variation introduced to Europe, and the potential for further introduction, is therefore of critical concern.
Estimates of H. fraxineus microsatellite allelic richness suggest as few as two haploid individuals may have invaded Europe 13 . Interestingly, the viral pathogen of H. fraxineus (mitovirus 1) also has two genetic groups present in the European population 14 . Given the impact of the pathogen invasion so far, a founder effect of just two haploid individuals would represent a genetic paradox and so it is important to measure genetic diversity across the genome as well as meaningful, adaptive diversity in the host interaction genes (effectors). Determining the genetic diversity present in the native range is key to understanding the consequences of any further introductions. Here, we assembled and annotated a high-quality draft genome of H. fraxineus. We quantified the level of genetic variation in 43 H. fraxineus haploid isolates from across Europe as well as 15 haplotypes from part of its native range (Japan). To understand the adaptive potential of the ash dieback pathogen in Europe as well as the potential of future invasions, we determined the effective population size, the bottleneck into Europe and estimated the size of the source population. In addition, we mined putative effector genes from the genome of H. fraxineus. These genes encode secreted proteins considered to play a key role in establishing fungal infections and facilitating disease development 15 . We measured signals of adaptive diversity in the putative effectors and all other genes. The proportion of introduced adaptive variation and the potential for further introduction are a key focus of the present work.

Results and discussion
Population genetic diversity of the emerging invasive H. fraxineus pathogen measured among 43 isolates from across Europe is an eighth of that observed in 15 haplotypes from a single wood in Japan. This general reduction in polymorphism, caused by a bottleneck of two divergent individuals, could reflect a reduction in the pathogen's adaptive potential in its introduced environment. This reduction in genetic diversity is present in effectors as well as all other genes. Effectors from European isolates retain a greater degree of adaptive variation but far less than in the native range and we discuss the implications for the predicted level of virulence in Europe over the long term.  Table 2). Genetic linkage between scaffolds in the progeny of a H. fraxineus lab cross suggests linkage between three pairs of scaffolds missing a telomere (Supplementary analysis 4). By joining these linked scaffolds, we estimate 20 chromosomes (pseudomolecules) for H. fraxineus (Fig. 2b).

H. fraxineus
The European invasion population is bottlenecked and sexually recombining diversity from Asia. In 43 European and 15 Japanese haplotypes, we identified 6.26 million single nucleotide polymorphisms (SNPs) overall (SNPs segregating in: Japan = 4.5 × 10 6 , Europe = 0.67 × 10 6 ; Supplementary Analysis 5). An SNP network of all genes shows us that the European population is genetically divergent and bottlenecked from the native population (Fig. 3). The disparity in nucleotide diversity (π) between Europe and Japan ( Fig. 3) could have resulted from a source population much smaller than that of Japan. Tajima's D 17 is a statistic, centred around zero, that is sensitive to changes in effective population size (and/or the mode of selection). In Japan, we observe a Tajima's D value close to zero (x = − 0.22), which indicates neutrality with purifying selection operating on genes (x = − 0.89; Fig. 3). In Europe, the genomic signal is broadly positive (x = 1.28; Fig. 3). This positive value is generated by a contraction in population size balancing allele frequencies by reducing the frequency of common alleles without equivalent reduction in the frequency of rarer alleles (see Supplementary  Fig. 10). This balancing of allele frequencies without a real reduction in SNP density is consistent with a small founder European population from a larger source.
Sex is a fundamental determinant of plant pathogen adaptive potential not least because it allows recombination of host interaction genes, or effectors 18  Articles Nature ecology & evolutioN interval = 0.54-0.56)) but recombination decouples relationships between genes. Across the genome, genetic differentiation correlates positively with diversity present in the Japanese population (R 2 = 0.18, n = 627, P < 0.001) but negatively with diversity present in the European population (R 2 = 0.51, n = 610, P < 0.001). That is to say that, the more diversity present at a locus in Japan, the less likely it is to be shared with Europe and, the more diversity present at a locus in Europe the more likely it is to be shared with Japan; a signal consistent with directionality from east to west (Fig. 2fg, Supplementary Fig. 11). Preventing continued gene flow from the native range into Europe is important because sexual reproduction between divergent demes may be important for plant pathogen adaptation 19 .
The European population was founded by two individuals from a large diverse population. A previous study 20 reported similar levels of genetic diversity (based on 11 microsatellite loci) between central European populations and those on the epidemic disease front. The authors suggest that for introduced diversity to reach the epidemic disease front, either the centre of genetic diversity quickly recombined and spread by range expansion, or only a small number of divergent H. fraxineus isolates arrived in Europe and present-day diversity is the product of recombination among those. Here, we use third-base positions of the coding regions of core eukaryotic genes (CEGs) to distinguish between these scenarios and describe the invasion process. CEGs are highly conserved essential proteins, encoded in all eukaryotic genomes 21 . Their importance to Hf-15 Hf  The average number of Japanese haplotypes was 12.6 with no less than two per locus. Strikingly, European CEGs grouped into two divergent haplotypes. Furthermore, the level of divergence between those haplotypes reflects that across Japanese CEG networks  . Encircled numbers are used in a to indicate numbers of individuals that share each haplotype. The three gene networks from the European population in a are the same genes as those from the Japanese population in b and drawn to the same scale. c,d, Density plots show the relative pairwise distance amongst haplotypes of all 387 CEGs in Europe (c) and Japan (d). In Japan, pairwise distances between haplotypes range in their divergence (d). In Europe, the majority of genes are either identical or at the complete opposite ends of the divergence range (c). The high level of divergence that separates European haplotypes, which are shared by many individuals, is visible in the three gene networks (a) but also in the measure of pairwise haplotype divergence of all 387 CEGs (c). This represents the presence of two major divergent haplotype groups in Europe.

Nature ecology & evolutioN
( Fig. 4, Supplementary Fig. 20). It is our interpretation that these divergent haplotype pairs have been introduced by two haploid founders, as suggested previously 13 . Moreover, the divergence between these European haplotype pairs, without intermediates, represents the ancestral divergence of a large population from which the European population was founded.
The observed estimate for the haploid effective population size (ϑ π = 2N e μ) in Japan is 2.46 million individuals (π = 0.0246) and the estimate from Europe is 0.67 million individuals (π = 0.00672; μ = 5 × 10 −9 per base per generation). However, this European estimate of nucleotide diversity is inflated by the ancestral divergence of the two haploid founders. We used this divergent polymorphism to estimate the size of the European source population. Coalescence simulations of a hypothetical European source, bottlenecked to two individuals, show that as the source effective population size increases, so too does the average divergence between founder haplotypes. We find that a source effective population size between 2.2 and 2.8 million individuals (median 2.5 million) most accurately replicates the observed European haplotype divergence (Fig. 5). Our estimate of the size of the source of the European H. fraxineus population is therefore equivalent to that from a single Japanese wood. These estimates may reflect the equilibrium diversity in any given H. fraxineus population within its native range, but importantly also suggests that the European invaders could have come from a single site, perhaps even from a single ascocarp (fruiting body).
10% of the H. fraxineus proteome putatively interacts with the host. Effectors are a broad classification of proteins that are secreted by bacteria, oomycetes, fungi, nematodes and aphids in order to interact with the host, disable host defence components and facilitate colonization 15 . As such, effectors are in a coevolutionary arms race with host resistance genes and are studied for insights into pathogen adaptive potential 22 . We used localization signals of N-terminal presequences to identify secreted proteins in the H. fraxineus genome (Fig. 2e). These 1,132 predicted secreted proteins are potential effectors, which clustered into 566 tribes (Supplementary Table 5). We did not identify any conserved fungal N-terminal signal motifs. A reduced set of 223 putative effectors were identified using a machine-learning approach (Supplementary Analysis 7). Nevertheless, we consider the broadest set of secreted proteins in subsequent analyses to avoid missing out on potentially important effectors.
Putative effector genes are spread across the genome, away from repeat regions, similar to non-effector genes (Fig. 2c-e;  Supplementary Figs. 21,22) which contrasts with predictions of the two-speed genome model, in other filamentous plant pathogens 23 (for example, Sclerotinia sclerotiorum 24 ). Absence of the two-speed evolution model has been observed in the genomic landscapes of members of the Magnaporthaceae family 25 . Perhaps, in H. fraxineus, the requirement for a dynamic process that operates to shuffle and generate novel effectors is lower in this sexually reproducing pathogen with high haplotype diversity 26 . The presence of sex and high haplotype diversity in turn indicate a long-term balanced relationship between host and pathogen within the native range 27 Structure and function of H. fraxineus putative effectors. Of the 1,132 predicted H. fraxineus secreted proteins, 62% carry Pfam domains associated with highly diverse biological functions (Supplementary Table 6). The largest subgroups include apoplastic catalytic enzymes, with 127 proteins predicted to have glycosyl hydrolase activity. Amongst these are putative cell-wall-degrading enzymes such as cellulases, pectinoesterases and cutinases. 77 predicted secreted proteins have oxidoreductase activity, 71 carry a cytochrome P450, 56 harbour conserved domains of unknown function and 396 did not carry a Pfam domain. H. fraxineus has a large number of predicted secreted Cytochrome P450 proteins. Cytochrome P450s' modes of action are typically via the monooxygenase reaction, which is important in the generation and destruction of chemicals especially aromatics. In H. fraxineus, they may be important in pathogenesis, especially as they carry a signal of diversifying selection (Supplementary Analysis 8). Potential roles for P450s in pathogenesis include: (1) destruction of ash-tissuederived aromatic compounds with antifungal properties. Plant secondary metabolism pathways are active during the infection process of Magnaporthe oryzae 28 . (2) Penetration of ash tree tissues through extracellular oxidation of wood and metabolism of hydrocarbon compounds, which are the main constituents of host plants cuticle 29 . (3) P450s are part of the biosynthetic pathways for mycotoxins and phytohormones secreted by H. fraxineus during invasion of ash tissue, similar modes of action have been shown for Fusarium multifunctional cytochrome P450 monooxygenase Tri4 30 . The presence of such a high predicted number of P450s in the secretome of H. fraxineus suggests biochemistry unique to H. fraxineus.
Transcribed low-complexity domains may provide effectors with flexibility and diversity driven by adaptive evolution 31 . Of 1,132 effectors, 22% contain short repeats (for example, tetratricopeptide repeats and leucine-rich repeats; Supplementary Table 7). Some are predicted to have a nuclear localization signal often indicative of intracellular function in host cells whilst others have possible apoplastic function. Many putative effectors (37%), are cysteine rich with between one and five predicted disulphide bonds (Supplementary Table 8), which are predicted to confer stability, biological activity and resistance to proteases in the apoplastic space 32 and inside infected host cells 33 .
We identify three effectors (one of them previously identified 16 ) with an NPP1 (necrosis-inducing Phytophthora protein) domain, which is present in fungal, oomycete and bacterial proteins that induce hypersensitive-reaction-like cell death upon infiltration into plant leaves 34 . Two other proteins have predicted cell death activity and two more carry S1/P1 nuclease activity domains, which are involved in non-specific cleavage of RNA and single-stranded DNA.

Nature ecology & evolutioN
Host cell death induction is suggested to be common among facultative parasites that engage in a necrotrophic lifestyle 35 .
Adaptive effector diversity is present in Europe, but is far greater in the native range. We analysed genetic diversity in all the genes of the European and Japanese populations (Supplementary Table 9). First, there remains a significant positive correlation in the level of polymorphism present in genes between the invasive European and native Japanese populations, despite an 81% reduction in SNP density in Europe (Fig. 6). This reduction in the level of polymorphism has impacted putative effector and other genes alike, whereas, in Japan putative effector genes maintain significantly greater levels of polymorphism than that of other genes (Fig. 6). Putative effectors in Europe have an increase in SNPs that affect splice regions as well as 5′ UTRs. Importantly however, there are also increases in SNP densities in synonymous, intron, up-and downstream positions in putative effectors relative to other genes (Supplementary Analysis 8). Therefore, increases within these effector features could be the product of increased SNP density through linkage and balancing selection 36 operating on effectors.
Pathogen effectors must adapt to avoid host recognition and so they are expected to undergo positive, diversifying and balancing selection 15 . To investigate the role of selection in the maintenance of polymorphism we measured adaptive diversity using the mean pairwise ratio of non-synonymous (P N ) to synonymous (P S ) polymorphism present in all genes. The P N :P S ratio can detect the presence of adaptive evolution when applied to short regions of a gene, for example, a binding domain, because a value greater than one indicates positive or diversifying selection (functional divergence) 37 and P N :P S close to zero indicates negative or purifying selection. Here, we apply the P N :P S ratio across whole genes, which we presume have different modes of selection operating across them. However, we expect effectors will retain an overall higher P N :P S value despite dilution by purifying selection operating over the rest of the gene.
Approximately 97% of all P N :P S values in the European and Japanese effectors are lower than one. Within this range, we observe a significant increase in the effector P N :P S over that of other genes between 0.2-0.5 (Fig. 6). This represents a reduction in the strength of purifying selection in effectors relative to other genes. At P N :P S values greater than one, we observe a significant peak in the density for effectors which, indicates a number of effectors evolving under contemporary positive selection (Fig. 6). Effectors in Japan have higher P N :P S values than other genes (Fig. 6). In Europe, effector adaptive diversity has been maintained despite the bottleneck, albeit at a lower level than in the native range (Fig. 6). Finally, consistent with linkage and balancing selection 36 , the level of synonymous polymorphism is also higher in effectors in both Japan and Europe (Supplementary Fig. 25). Pseudogenes too, may have a P N :P S ratio approaching one. Pseudogenization could be more readily observed for effectors increasingly recognized by the immune system 38

Conclusions
The bottleneck of H. fraxineus into Europe removed the majority of neutral and much of the adaptive genetic variation. Despite this, the pathogen has devastated ash from east to west and host defence is characterized in terms of levels of susceptibility 5 . Further introduction of pathogen genetic diversity runs the risk of increasing disease prevalence. Successive invasions can increase the level of genetic diversity above that of native populations 39 and drive temporal fluctuations in adaptive diversity 40 . Increased virulence at invasion onset, due to uninfected host density, may be attenuated by natural selection on the pathogen as uninfected density reduces 41 but this is much less probable where immigrations continue. Tree pathogens can be introduced through trade in live trees and untreated wood 1 . If multiple native genotypes can invade, we face a situation where further immigration is likely to be accompanied by extreme increases in the level of adaptive polymorphism and disease severity. It is most important to prevent further introduction of pathogen diversity to Europe, particularly by prohibiting imports of Fraxinus species from East Asia, including material that may contain leaf debris bearing ascocarps.
The H. fraxineus native range, Asia, is large and the fungus lives there on the leaf litter of several ash species. Broader population genetic analyses over this range will allow us to address important questions on H. fraxineus adaptation to specific hosts. Moreover, the two-haploid founder scenario provides a unique system to unravel the population genetics of invasion with genome evolution. Finally, efforts to gain an understanding of disease progression will combine effector nucleotide statistics with targeted RNA-seq experiments.
Notwithstanding the potential impact of an introduction of the emerald ash borer 42 , current levels of European pathogen genetic diversity may infect and kill 95% of all European ash. We must consider the implications of any further introduction of diversity from the ash dieback pathogen to an already dire situation.

Methods
H. fraxineus collection and sequencing. 44 H. fraxineus isolates were collected from Europe (21 UK, 13 Norway, 5 France, 4 Poland, 1 Austria) and 9 from Japan (Supplementary Table 1). Japanese isolates were made up of one haploid and eight fruiting bodies (see isolate phasing below). Fruiting bodies were stored in ethanol and haploid isolates were cultured on malt extract agar 43  The annotation pipeline used for genome Hf-v1.1 16 was replicated on Hf-v2 adding newer RNA-seq data from the Open Ash Dieback repository (Supplementary Analysis 3). The Hf-v2 genome was repeat masked using de novo models, known fungal and low-complexity repeats. Annotations (9,737) from Hf-v1.1 were transferred to Hf-v2. Augustus v2.7 45 was used to predict additional protein-coding genes using protein and RNA-seq alignments (16 libraries, Supplementary Table 16) with the previously trained H. fraxineus model. We ran a transcript extension protocol on all genes to identify whether those genes within 100 bp of an upstream gene had evidence for extension. We also checked this extended gene dataset for potential secreted proteins. This protocol allowed the identification of 36 genes that had not previously been recognized as having a methionine start codon which were included into our effector-mining pipeline (Supplementary analysis 3).
We used a pipeline described previously 32 to identify putative effectors. Briefly, transcripts with a signal peptide were identified and from these we remove transcripts with transmembrane or mitochondrial localization signals and then transcripts with repeats, the disulphide bonds and nuclear localization signals are identified. Finally, secreted proteins were clustered into tribes (see Supplementary analysis 8).
Mapping and SNP calling. Reads were trimmed for Illumina adapters and Phred quality (-q20) and read alignment was performed using BWA-MEM v0.7.7 46 after which duplicates were removed. VCFtools v0.1.13 47 was used to filter variants to a minimum depth of ten, maximum depth of 1.8 × mean individual depth and a SNP genotype quality of at least 30. SNP sites with more than two alleles were excluded as probable errors, as were sites reported as heterozygous in haploids. Finally, sites that were missing in ≥ 20% individuals were removed. Three samples were removed from further analyses because of insufficient depth or evidence of contamination, leaving 20 UK, 13 Norway, 5 France, 4 Poland, 1 Austria, and 8 Japan (1 haploid 7 fruiting bodies; see Supplementary Analysis 1).
Phasing fruiting body samples. We used Shapeit2 48 , to phase fruiting body data, this first applies a phase informative read step (extractPIRs) to group SNPs present on the same read pair and then phases remaining SNPs using population level polymorphism (assemble --states 1,000 --burn 60 --prune 60 --main 300 --effective-size 564,000 --window 0.5). All Japanese samples including the haploid strain (as control) were phased. As required indels were removed from the dataset before phasing. Effective population size (ϑ = 2N e μ) was calculated assuming a mutation rate of 5 × 10 −9 per generation per site 49 using the mean nucleotide diversity present in 100-kbp windows in the Japanese population.
SNP diversity and divergence analysis. Polymorphism statistics and sliding windows, π and F ST were conducted using VCFtools v0.1.13 47 . DNAsp v5.10.01 50 was used to calculate population genetic statistics per gene and PAML v4.9 51 was used (YN00) in pairwise mode to calculate the average P N :P S ratio nonsynonymous (P N ) and synonymous (P S ) values for polymorphism in each gene with at least one synonymous mutation. The sm package 52 in R v3.2.1 compared the density distributions of P N :P S . Output data for all genes are combined in Supplementary  Table 9 (see also Supplementary Analysis 6). Data are represented using SplitsTree v4.13.1 53 using the neighbour-net algorithm and CIRCOS v0.67-5 54 .
Parental cross and linkage analysis. Two H. fraxineus isolates of opposite mating types, LWD054 and LWD067, were collected from stem lesions on trees in Norfolk, UK in 2014. Crosses were made according to a previously reported method 55 except that isolates were kept at 25 °C in 16 hours light. KASP 56 markers were designed to SNPs between two H. fraxineus parents mapped to the Hf-v2 assembly (Supplementary Analysis 4). Linkage between scaffolds was ascertained using chisquared test of sites close to ends of the 23 scaffolds, using 28 F 1 offspring. Scaffolds at linkage group ends were examined for the presence of telomeric motifs.
Statistics. R v3.2.1 57 was used to fit simple regression models to test the level of association between population signals in order to show directionality in the invasion and portray the population contraction (founder effect) behind Tajima's D. To evaluate the correlation in gene diversity (the proportion of SNPs per gene) in Japanese and the European populations we log-transformed SNP density excluding genes with less than two SNPs (leaving 6,216 genes). SNP density regression models were calculated on log-log data and plotted on the original scale. We also correlated feature coverage (that is, repeat content) of regions with gene and effector coverage to understand genome evolution. Two-sample Wilcoxon tests (one tailed indicated by '> ' or '< '; two tailed indicated by '~'), were used to test for differences in signals of nucleotide diversity, Tajima's D and P N :P S between non-effector and effector genes in European and Japanese populations. Bootstrap analyses compare variants effect by feature or gene (i.e. SnpEff output) and involved sampling feature with replacement (× 1,000). Bootstrapped comparisons were tested using a randomization test and 95% confidence intervals are presented.
Ancestral effective population size. To estimate the size of the population from which the European population was founded we used the SNPs at the third base pair in codons within 387 core eukaryotic genes. CEGMA v2.4 21 initially identified 440 CEGs which was reduced to 387 genes that were confirmed by reciprocal blastp (BLAST v2.2.31 58 (-evalue 1 × 10 -5 ) with a minimum of 90% coverage (removing technical assembly errors and biological duplicates), removing CEGs with nonsense variants (expected pseudogenes) and a minimum mean depth of 10× per gene across all individuals (poorly covered CEGs). Remaining genes were then run through a pipeline to quantify the pairwise distance between alleles (the level of nucleotide polymorphism, or distance between haplotypes, is defined as the number of SNPs (S) divided by the total length of the sequence (N) using SplitsTree) and the diversity per gene (using DNAsp v5). Relative pairwise distances between (third base pair) haplotypes is plotted as a density including all 387 CEGs combined. CEG networks are shown as a visual demonstration of the shapes of three polymorphic networks in Japan and Europe HYMFR746836.2.0_000003240.1, -000004200.1, -000061770.1) The average number of base pairs (third codon base pair) per CEG was 448 and this number was used as an input to fastsimcoal2 v2.5.2.8 59 , a fast sequential Markov coalescent simulator used to estimate the size of the population from