The ash dieback invasion of Europe was founded by two individuals from a native population with huge adaptive potential

Accelerating international trade and climate change make pathogen spread an increasing concern. Hymenoscyphus fraxineus, the causal agent of ash dieback is one such pathogen, moving across continents and hosts from Asian to European ash. Most European common ash (Fraxinus excelsior) trees are highly susceptible to H. fraxineus although a small minority (~5%) evidently have partial resistance to dieback. We have assembled and annotated a draft of the H. fraxineus genome which approaches chromosome scale. Pathogen genetic diversity across Europe, and in Japan, reveals a tight bottleneck into Europe, though a signal of adaptive diversity remains in key host interaction genes (effectors). We find that the European population was founded by two divergent haploid individuals. Divergence between these haplotypes represents the 'shadow' of a large source population and subsequent introduction would greatly increase adaptive potential and the pathogen's threat. Thus, EU wide biological security measures remain an important part of the strategy to manage this disease.

7 introduced diversity to quickly spread to the invasion front either, the centre of genetic diversity quickly 147 recombined and spread by range expansion, or only a small number of divergent H. fraxineus isolates arrived 148 in Europe and present day diversity is the product of recombination among those divergent isolates. Here, 149 we use third base positions of the coding regions of Core Eukaryotic Genes (CEGs) to distinguish between 150 these scenarios and describe the invasion process. CEGs are highly conserved proteins essentially encoded in 151 all eukaryotic genomes 34 . Their importance to fundamental eukaryotic biology makes the 387 CEGs we 152 identified in all H. fraxineus individuals an ideal set to explore the European invasion whilst minimising the 153 impact of processes associated with polymorphism generation operating on other genes 35 . The bottleneck 154 into Europe has reduced the number of CEG haplotypes to 2.3 and 42% of all CEGs were monomorphic. 155 Whereas, in Japan, the average number of haplotypes was 12.6 with no less than two haplotypes per locus. 156 Strikingly, European CEGs are grouped into two divergent haplotypes and the majority of individuals have 157 one or the other of the two main haplotypes. Furthermore, the level of divergence between those haplotypes 158 reflects that across Japanese CEG networks (Fig. 3, Supplementary Fig. 18). It is our interpretation that 159 these divergent haplotype pairs present in Europe have been introduction by two haploid founders as first 160 suggested by Gross et al. 3 . Moreover, the divergence between these European haplotype pairs, without 161 intermediates, represents the shadow of an ancestral population from which the European population was 162 founded. 163 The observed estimate for the haploid effective population size (ϑπ = 2Neμ) in Japan is 2.46 million 164 individuals (π = 0.0246) and the estimate from Europe is 0.67 million individuals (π = 0.00672; μ = 5×10 -9 165 per base per generation). However, this European estimate of nucleotide diversity is inflated by the 166 divergence shadow of the two haploid individuals that founded the population and we used this divergent 167 polymorphism to estimate the size of the source population from which this divergence shadow was 168 generated. Coalescent simulations of a hypothetical European source of fixed effective population size 169 which was bottlenecked to two (haploid) individuals shows us that an increase in effective population size, 170 increases the average divergence between haplotypes in the two bottlenecked individuals. We find that an 171 effective population size between 2.2 and 2.8 million individuals (closest median estimate = 2.5M) most 8 accurately replicates the observed diversity in present day European haplotypes (Fig. 4). Our estimate of the 173 size of the source of the European H. fraxineus population is equivalent to that from a single Japanese wood. 174 These estimates may reflect an expectation of the equilibrium diversity in any given H. fraxineus population 175 but importantly also suggest that the European invaders could have come from a single site, perhaps even as 176 a single ascocarp (fruiting body). 177 178

10% of H. fraxineus genes putatively interact with the host 179
The similarities between the effective population size of the Japanese and the estimated European-source 180 populations suggest that the levels of genetic diversity observed in this native population reflect that of a 181 representative European source. Effectors are a broad classification of proteins that interact with the host. 182 They are secreted by bacteria, oomycetes, fungi, nematodes and aphids and they disable host defence 183 components and facilitate colonisation. As such, effectors are in a co-evolutionary arms race with host 184 resistance genes 36 and therefore can be studied for insights into pathogen adaptive potential. We used 185 localisation signals of N-terminal presequences 37 to identify secreted proteins (i.e. putative effectors) 186 throughout the genome of H. fraxineus (Fig. 1, track 5). From a starting set of 11,111 predicted proteins 187 1,132 are predicted effectors, which we clustered into 566 tribes (Supplementary Table 5

Adaptive effector diversity is present in Europe, but is far greater in the native range 244
In sexually reproducing species genes may carry their own demographic history and here we can use 245 estimates of nucleotide diversity, SNP effects and non-synonymous to synonymous polymorphism to 246 understand the roles of these genes. We analysed the level of nucleotide diversity in all the genes, (including 247 putative effectors) of the European and Japanese populations (Supplementary Table 9). First, there remains 248 a significant positive correlation in the level of genetic diversity present in genes between the founder 249 European and Japanese populations (gene π, R 2 = 0.180, p < 0.001; effector π, R 2 = 0.144, p < 0.001; Fig. 5), 250 11 despite a significant (73%) reduction in nucleotide diversity in Europe (Wilcoxon (π), Jp all genes > EU all 251 genes: W(21885) = 102060000, p < 0.001; Fig. 6). This reduction in the level of nucleotide diversity has 252 impacted genes (other than putative effectors) and putative effector genes alike as they are no different 253 (Wilcoxon (π), EU genes ~ EU effectors: W(10942) = 5420600, p = 0.78; Fig. 6). Whereas, in Japan 254 putative effector genes maintain significantly greater nucleotide diversity than that of other genes (Wilcoxon 255 (π), Jp genes < Jp effectors: W(10942) = 5671200, p < 0.041; Fig. 6). 256 SNP effects can be characterised and we find that putative effectors in Europe have an increase in 257 SNPs found to affect splice donor sites, or found in splice regions, and 5' UTRs However, we expect that effectors will retain an overall higher ka:ks value despite dilution by purifying 12 selection operating over the majority of the rest of the gene. 277 Approximately 97% of all ka:ks values in the European and Japanese effectors are lower than one. 278 Within this range, we observe a significant increase in the effector ka:ks over that of other genes between 279 ω = 0.2 -0.5 (Fig. 7). This represents a reduction in the strength of purifying selection in effectors relative 280 to other genes. At ka:ks values greater than one, we observe a significant peak in the density for effectors 281 which indicates a number of effectors evolving under contemporary positive selection (Fig. 7). Effectors in 282 Japan have higher ka:ks values than other genes (Wilcoxon (ka:ks), Jp genes < Jp effectors: 283 W(10942) = 4891800, p < 0.001; Fig. 8). In Europe, effector adaptive diversity has been maintained despite 284 the bottleneck, albeit at a lower level than in the native range (EU genes < EU effectors: The bottleneck of H. fraxineus into Europe removed the majority of neutral and much of the adaptive genetic 297 variation. Despite this, the pathogen has decimated ash from East to West. Host defence is characterised in 298 terms of levels of susceptibility 56 and with native effector (and genome) diversity high, further introduction 299 of pathogen genetic diversity runs the risk of increasing disease prevelence 57 . Successive invasions can both 300 increase the level of genetic diversity above that of native populations and drive temporal fluctuations in 301 important pathogen diversity 27,58 . Finally, an increased virulence at invasion onset, due to the density of 302 13 uninfected hosts 59,60 , may be attenuated by natural selection on the pathogen as uninfected density reduces 61 303 but this is much less probable where continued gene flow is allowed. For all of these reasons, it is most 304 important to prevent further introduction of pathogen diversity to Europe. 305 Tree pathogens can be accidentally introduced because of commercial trade of live trees, wood and 306 wood products 1 and we are unable to manage host resistance in the same way that we do for crops 62 . If we 307 interpret the divergence between founder haploids to mean that multiple native genotypes can invade, we 308 face a situation in which the majority of genes (91.9%) from a single additional haploid individual from the 309 Japanese population would harbour a novel allele. Given the overall number of haplotypes present in Europe 310 and Japan, any further migration from the East is likely to be accompanied by extreme increases to the level 311 of adaptive polymorphism and disease severity of the European population. 312 Here we publish a high-quality H. fraxineus genome with annotation and we begin to analyse the 313 population genetics of the European invasion using a limited sample of the native range. Further population 314 genetics analyses including broader sampling of single isolates from the native range will allow robust 315 modelling of the source meta-population and the invasion scenario. Moreover, population genetics of de 316 novo assembled isolates can be used to unravel processes of genome evolution given the specifics of this 317 two-haploid introduction. Important for the understanding of the disease progression is the combination of 318 native effector nucleotide statistics with targeted RNA-seq experiments. 319 Notwithstanding the potential impact of an introduction of the emerald ash borer 18,63 , current levels 320 of European pathogen genetic diversity may infect 95% of all European ash. We must consider the 321 implications of further introduction of diversity from the ash dieback pathogen. 322 323

Hymenoscyphus fraxineus collection and sequencing 325
A total of 44 Hymenoscyphus fraxineus isolates from Europe (21 UK, 13 Norway, 5 France, 4 Poland, 1 326 Austria) and nine from Japan (one haploid and eight fruiting bodies) were collected and cultured on malt 327 extract agar 13 . Species confirmation was performed by assignment using PCR and sequencing of ITS 328 14 sequences. DNA was sequenced at either Edinburgh Genomics or The Earlham Institute on MiSeq and 329 HiSeq Illumina platforms to between 25-160 sequence depth (Supplementary Table 1). 330 Information 2). 339

Genome Assembly and annotation of H. fraxineus
The annotation pipeline used for genome Hf-v1.1 29 was replicated on Hf-v2 adding newer RNA-seq 340 data available via the Open Ash Dieback repository (Supplementary Information 3). The Hf-v2 genome was 341 repeat masked using de novo modelled repeats, known fungal and low complexity repeats 342 (REPEATMODELER v1.0.8, REPEATMASKER v4.0.5 68 ). Annotations (9,737) from Hf-v1.1 were transferred to 343 Hf-v2 using GMAP 69 . AUGUSTUS v2.7 70 was used to predict additional protein-coding genes using protein 344 and RNA-seq alignments (16 libraries , Supplementary Table 16) with the previously trained 345 Hymenoscyphus fraxineus model. We ran a transcript extension protocol on all genes to identify whether 346 those genes within 100bp of an upstream gene had evidence for extension. In addition, we checked this 347 extended gene dataset for potential secreted proteins. This protocol allowed the identification of 36 genes 348 that had not previously been recognised as having a methionine start codon which were included into our 349 effector-mining pipeline. 350 We used the pipeline described by Saunders et al. 29,37 to identify putative effectors. Briefly, 351 SIGNALP4 (-t euk -s notm -u 0.34) was run to identify transcripts with a signal peptide and from these 352 TMHMM and TARGETP were used to remove transcripts with transmembrane or mitochondrial localisation 353 signals 71 . Finally, secreted proteins were clustered into tribes using TRIBEMCL v12.135 (e-value 10 -5 ) 72 . 354 We used T-REKS 73 to identify repeats, the Disulfide algorithm 74 to predict disulfide bonds and 355 PREDICTNLS 75 to predict nuclear localisation signals. 356

Mapping and SNP calling 357
Reads were trimmed for Illumina adapters and Phred quality (-q20) and discarded where length was reduced 358 below 80bp (TRIM GALORE v0.3.3; Babraham Institute, Cambridgeshire, UK). Read alignment and mapping 359 were performed using BWA-MEM v0.7.7 76 . SAMTOOLS v0. 1.19 and BCFTOOLS v0.1.19-44428cd 77 were 360 used to sort, remove duplicate reads, mpileup (-D) and call variants (bcftools view -cg). VCFTOOLS 361 v0.1.13 78 was used to filter variants to a minimum depth of ten, maximum depth of 1.8 × mean individual 362 depth and a SNP genotype quality of at least 30. SNP sites with more than two alleles were excluded as 363 likely errors, as were sites reported as heterozygous in haploids. Finally, sites that were missing in 20% or 364 more individuals were also removed. Three samples were removed from further analyses because of 365 insufficient depth or evidence of contamination (20 UK, 13 Norway, 5 France, 4 Poland, 1 Austria, 8 Japan 366 (1 haploid 7 fruiting bodies); see Supplementary Information 1). 367 Phasing fruiting body samples 368 SHAPEIT2 79 , used to phase fruiting body data, first applies a phase informative read step (extractPIRs) to 369 group SNPs present on the same read pair and then phases remaining SNPs using population level 370 polymorphism in the second step (assemble --states 1000 --burn 60 --prune 60 --main 300 --effective-size 371 564000 --window 0.5). All Japanese samples including the haploid strain were used in the phasing step. As 372 part of the phasing process indels must be removed from the dataset. Effective population size (ϑ=2Neμ) was 373 calculated assuming a mutation rate of 5×10 -9 per generation per site 80 using the mean nucleotide diversity 374 present in 100Kbp windows in the Japanese population. 375

SNP diversity and divergence analysis 376
Polymorphism statistics and sliding windows i.e. sites per individual, SNP depth, transition/transversion 377 (Ts/Tv) ratio, linkage (--thin 1000; --ld-window-bp 500000), π and FST were conducted using VCFTOOLS 378 show directionality in the invasion and portray the bottleneck behind Tajima's D. In addition, we correlated 399 feature coverage (i.e. repeat content) of regions with gene and effector coverage to understand genome 400 evolution. Chi-squared tests were used to assess linkage between scaffolds in the offspring of a parental 401 cross. Two-sample Wilcoxon tests (one tailed indicated by '>' or '<'; two tailed indicated by '~'), were used 402 to test for differences in signals of nucleotide diversity, Tajima's D and ka:ks between non-effector genes 403 and effectors in European and Japanese populations. Bootstrap analyses were used to compare variants effect 404 by feature or gene (e.g. SnpEff output) and involved sampling feature with replacement (×1000). 405 Bootstrapped comparisons are tested using a randomisation test and 95% confidence intervals (CI) are 406 Japan and Europe even after a population bottleneck. This reduction in nucleotide diversity for both genes 503 and putative effectors in Europe is not significantly different between these groups of genes. 504 is significantly higher for effectors over a considerable range of the distribution between zero and one 515 (p < 0.001). This is evidence for reduced efficacy of purifying selection in these genes. ka:ks greater than 516 one (less than two; inset) also contains a peak for effector genes which is consistent with the operation of 517 positive selection driving adaptive change in these effectors (p = 0.01). The signal of polymorphism visible to natural selection is greater in effectors than other genes in both Japan 523 and Europe. However, the strength of this signal has been reduced in the European population. 524 Synonymous site density is significantly higher in effectors in both the European and Japanese populations. 525