Genomic signatures of somatic hybrid vigor due to heterokaryosis in the oomycete pathogen, Bremia lactucae

Lettuce downy mildew caused by Bremia lactucae is the most important disease of lettuce globally. This oomycete pathogen is highly variable and has rapidly overcome resistance genes and fungicides deployed in attempts to control it. The described high-quality genome assembly of B. lactucae provides the foundation for detailed understanding of this economically important pathogen. The biotrophic nature of B. lactucae coupled with high levels of heterozygosity and the recently expanded repeat content made genome assembly challenging. The combined use of multiple read types, including synthetic long reads, single molecule sequences, and Hi-C, resulted in a high-quality, chromosome-scale, consensus assembly of this diploid organism. Phylogenetic analysis supports polyphyly in the downy mildews consistent with the biotrophic mode of pathogenesis evolving more than once in the Peronosporaceae. Flow cytometry plus resequencing of 30 field isolates as well as sexual offspring and asexual derivatives from multinucleate single sporangia demonstrated a high incidence of heterokaryosis in B. lactucae. Heterokaryons have phenotypic differences and increased fitness compared to homokaryotic derivatives. Consequently, B. lactucae exhibits somatic hybrid vigor and selection should be considered as acting on a population of nuclei within coenocytic mycelia. This provides evolutionary flexibility to the pathogen enabling rapid adaptation to different repertoires of host resistance genes and other challenges. The advantages of asexual persistence of heterokaryons may have been one of the drivers of selection that resulted in the loss of uninucleate zoospores in multiple downy mildews.

sojae has the largest reported gene space within the Peronosporaceae at 37.7 Mb (Supplementary 151 Expression inferred by presence in the transcriptome assembly was detected for 109 of these 157 candidate RxLR effectors, 35 of which also had an EER motif or WY domain (Table 3). In addition to 158 the 161 RxLR candidates, 26 predicted secreted proteins and 19 proteins lacking a secretion signal 159 had one or more WY domains but no detectable RxLR motif. Of these, 19 WY proteins lacking an 160 identifiable RxLR motif with a signal peptide and 13 without a signal peptide were detected in the 161 transcriptome (Table 3). Interestingly, an EER or EER-like motif was detected in the first 100 residues 162 from 29 of the 45 WY proteins that lacked an RxLR motif, 20 of which were predicted to be secreted. 163 This is consistent with not all effectors requiring an RxLR motif for translocation in to the host cell, 164 similar to previously reported effectors in animal pathogenic oomycetes (58, 59). Two putative 165 secreted Crinklers (CRNs) (60, 61) were annotated, one of which also contained an RxLQ and DDR 166 motif. An additional 74 CRNs lacking a secretion signal were identified, although only six of these 167 were present in the transcriptome assembly (Table 3). Four of these six had the canonical LFLAK 168 motif and the other two had a LYLA motif (60, 61). Together, these candidate effectors comprise 169 1.9% of all genes annotated in B. lactucae. Orthologs of all proteins which have previously been 170 described as inducing a host response were detected in the draft assembly (Supplementary Table 3 Table 4). This is lower than the 173 proportion reported for Phytophthora spp. (2.6 to 3.6%) and consistent with observations for other 174 downy mildews where 1.3 to 1.7% of total annotated proteins had putative pathogenicity domains 175 (47). 176 The majority of genes encoding flagella-associated proteins and calcium-associated domains 177 were missing from the B. lactucae genome. B. lactucae has lost 55 of 78 orthogroups that contain 178 flagellar proteins ( Supplementary Fig. 2). One hundred and twelve proteins from P. infestans were 179 present in these orthogroups; 78 of these proteins were absent in B. lactucae. This is similar to 180 assemblies of other non-flagellate downy mildews that had 34 to 48 proteins in these orthogroups 181 ( Supplementary Fig. 2). This is consistent with the loss of zoospore production by B. lactucae. There 182 was also a significant loss of calcium-associated domains, which is also observed in the assemblies of 183 other non-flagellate downy mildews. B. lactucae had no proteins present in 125 of the 177 calcium-184 associated orthogroups similar to other non-flagellates, which ranged from 118 to 125. These 185 orthogroups contained 53 proteins from B. lactucae compared to 193 proteins in P. infestans. Other 186 non-flagellate species had 52 to 59 proteins assigned to these orthogroups ( Supplementary Fig. 2). 187 The parallel loss of zoospore production and proteins with calcium-associated domains in both 188 clades of downy mildews (Fig. 3) is consistent with the involvement of these proteins in 189 zoosporegenesis (65). Genes encoding carbohydrate binding, transporter, and pathogenicity 190 associated domains were also under-represented in B. lactucae as previously reported for other 191 downy mildews in both clades (47). This provided further evidence for the convergent loss of genes 192 encoding these domains during adaptation to biotrophy. 193 The majority of annotated genes had levels of coverage close to the average sequencing 194 depth ( Supplementary Fig. 3), indicating that most genes were each assembled into a single 195 consensus sequence. A minority of genes had a normalized read depth equal to half the sequencing 196 coverage, consistent with divergent haplotypes that had assembled as independent sequences. The 197 BUSCO (46) genes had the same distribution. However, genes encoding candidate effectors had 198 variable coverage; this could have been due to a disproportionate number of effector haplotypes 199 being assembled independently and/or a high rate of divergence between haplotypes resulting in 200 poor mapping rates. 201

Genomic signatures of heterokaryosis 202
Distinct alternative allele frequency profiles were detected in multiple isolates of B. lactucae 203 ( Fig. 4 and Supplementary Fig. 4). Such analysis had previously been used to support polyploidy in P. 204 infestans (14, 66). The profiles of thirteen isolates, including the reference isolate SF5, were clearly 205 unimodal, seven isolates were trimodal, and nine isolates were trimodal ( Supplementary Fig. 4). Two 206 other isolates had profiles that were not clearly bimodal or trimodal ( Supplementary Fig. 4). The 207 symmetrical unimodal distribution of SF5 was consistent with a diploid genome; the other 208 distributions were not. However, the genome size for all isolates as measured by flow cytometry 209 varied by less than 3%. In the case of polyploidy, the genome size of triploids and tetraploids would 210 be 150% and 200% that of the diploid, respectively; therefore, there was no evidence for polyploidy 211 in B. lactucae (Fig. 1 a and Table 1). The outcross origin of these progeny was confirmed by the presence of 217 unique combinations of SNPs inherited from each parent. Therefore, these progeny isolates could 218 not have arisen by apomixis or selfing and all sexual progeny from this unimodal x trimodal cross 219 were diploid. The origins of the gametes in this cross were determined for 38 progeny isolates that 220 had been sequenced to sufficient depth. Pairwise SNP-based kinship coefficients revealed two 221 distinct half-sib families of 29 and 9 individuals (Fig. 5). Therefore, three rather than two nuclei 222 contributed gametes in this cross. The trimodal alternative allele frequency plot and flow cytometry 223 of C82P24 are consistent with this isolate being heterokaryotic with two diploid nuclei. 224 To confirm that C82P24 was heterokaryotic rather than a mixture of two isolates, 20 asexual 225 derivatives were generated from single sporangia. Kinship of these 20 isolates was as high between 226 one another as with the original isolate, indicating they were identical. Furthermore, all asexual 227 derivatives of C82P24 displayed similar relatedness to all sexual progeny (Fig. 5). Sequencing of 11 228 asexual derivatives to >50x coverage demonstrated that they retained the trimodal profile, 229 indicating that two distinct nuclei were present in each derivative ( Supplementary Fig. 6) with a 230 diploid size of 303 +/-3 Mb as measured by flow cytometry (Supplementary Table 5). Therefore, 231 C82P24 was heterokaryotic rather than a mixture of isolates. 232 To demonstrate heterokaryosis in another isolate, 10 asexual derivatives were also 233 generated from isolate C98O622b, which displayed a trimodal alternative allele frequency. In this 234 case, kinship analysis revealed three distinct groups of derivatives (Fig. 6i). Derivatives A to F had a 235 high kinship and an identical virulence phenotype to C98O622b (Fig. 6i, ii). Derivatives G to I and 236 derivative J had lower kinship to C98O622b than derivatives A to F. The lowest kinship was between 237 derivatives G to I and J (Fig. 6i). Virulence phenotypes varied between but not within groups (Fig. 6ii); 238 C98O622b and derivatives A to F were virulent on both Dm4 and Dm15; derivatives G to I were 239 avirulent on Dm4 and virulent on Dm15, while derivative J was conversely virulent on Dm4 and 240 avirulent on Dm15 (Fig. 6ii). The single-spore derivatives of C98O622b were sequenced to >50x 241 coverage to determine their nuclear composition. Derivatives A to F were trimodal (Fig. 5iii, 242 Supplementary Fig. 7); all other derivatives were unimodal, which is consistent with the separation 243 of the heterokaryon into its diploid components (Fig. 6iii). This conclusion was supported by 244 combining read sets in silico. Combining reads of derivatives G-I did not increase their relatedness to 245 C98O622b, while combining reads of derivatives G, H, or I with those of J resulted in a high kinship to 246 C98O622b (Fig. 6i) and trimodal profile similar to C98O622b ( Fig. 6iii and Supplementary Fig. 8). 247 Therefore, C98O622b was also heterokaryotic; however, unlike C82P24, C98O622b was unstable and 248 could be separated into constituent homokaryotic derivatives by sub-culturing from single 249 sporangia. 250 The trimodal distributions of the derivatives A to F were not identical and could clearly be 251 split into two configurations. Derivatives B and F were similar to C98O622b, displaying peaks at 252 approximately 0.25, 0.5, and 0.75 (Fig. 6iii). The other four heterokaryotic derivatives A, C, D, and E 253 had peaks at approximately 0.33, 0.5, and 0.67 (Fig. 6iii). The nuclear composition of these 254 heterokaryotic derivatives was investigated by subsampling SNPs identified as unique to each 255 homokaryotic derivative (G to J). This revealed that in the trimodal distribution of derivatives B and F 256

Somatic hybrid vigor due to heterokaryosis 269
To investigate the potential benefits of heterokaryosis, the fitness of asexual derivatives of 270 C98O622b was assessed on a universally susceptible cultivar and two differential host lines. 271 Derivatives A and B were selected to represent unbalanced and balanced heterokaryons, 272 respectively, while derivatives I and J represented the two homokaryons. When grown on the 273 universally susceptible lettuce cv. Green Towers, the heterokaryotic derivatives grew faster than 274 either homokaryotic derivative. The balanced heterokaryotic derivative B was significantly fitter than 275 the homokaryotic derivative I (Fig. 7 a). There was no significant difference within heterokaryotic 276 derivatives or within homokaryotic derivatives when grown on cv. Green Towers. Therefore, the 277 heterokaryotic isolates were fitter when unchallenged by host resistance genes. However, when a 278 product of either nucleus of the heterokaryon was detected by a resistance gene (i.e. Dm4 in 279 R4T57D or Dm15 in NumDM15) that differentiates the homokaryotic derivatives ( Fig. 6 ii), the 280 heterokaryotic derivatives were less vigorous than the virulent homokaryotic derivative (Fig. 7 b). 281 This suggested that it may be possible to break a heterokaryon by repeated subculture on a selective 282 cultivar, as reported previously (10). When the heterokaryotic derivatives were inoculated onto an F1 283 hybrid of the selective lines expressing both Dm4 and Dm15, neither the heterokaryotic nor 284 homokaryotic derivatives were able to grow. Therefore, combining multiple resistance genes against 285 the entire B. lactucae population into a single cultivar remains a potentially effective strategy to 286 provide more durable resistance to the pathogen. 287 Heterokaryosis in B. lactucae has phenotypic consequences as well as implications for 288 interpretation of tests for virulence phenotype. Derivatives G, H, and I are race Bl:5-CA and 289 derivative J has a novel virulence phenotype. The heterokaryotic field isolate C98O622b is race Bl:6-290 CA, indicating that two phenotypically distinct isolates may combine to create a new phenotype 291 when characterized on individual resistance genes; such somatic hybrids may not be able to 292 overcome combinations of these resistance genes in a single cultivar. Therefore, reactions of 293 monogenic differentials are not necessarily a good predictor of virulence when heterokaryons are 294 tested. The instability of heterokaryosis may enable a successful infection and proliferation of 295 individual nuclear components. Furthermore, there is no a priori reason why coenocytic mycelia are 296 limited to having only two nuclear types. Multiple rounds of somatic fusion are possible if favored by 297 selection. Allele frequency plots are consistent with some isolates having more than two nuclei (e.g. 298 isolate C04O1017; Fig. 4 c). Therefore, heterokaryotic isolates should be considered as exhibiting 299 somatic hybrid vigor and selection for heterosis in B. lactucae as acting on populations of nuclei 300 within a coenocytic mycelium (Fig. 8) rather than on individual isolates. 301

Heterokaryosis in other oomycetes 302
Heterokaryosis may be a common phenomenon in other oomycetes that has yet to be 303 investigated extensively. Flow cytometry revealed heterogeneous nuclear sizes in mycelia of P. 304 infestans, although stability over multiple asexual generations was not reported (19). Somatic fusion 305 may be a route to allopolyploidy; inter-species somatic fusion could result in transient 306 heterokaryosis before nuclear fusion to form a somatic allopolyploid circumventing gametic 307 incompatibility. Somatic sporangial fusions have also been reported in a "basal" holocarpic 308 oomycete (67) demonstrating the possibility of widespread heterokaryosis within the family. 309 Heterokaryosis may be more prevalent in non-zoospore producing oomycetes. Production of 310 zoospores with single nuclei during the asexual cycle, exhibited by many oomycetes breaks the 311 heterokaryotic state each asexual generation (21, 23). However, some downy mildews and 312

Isolation, culturing, and DNA extraction 326
Bremia lactucae isolate SF5 has been reported previously (20,40,41). Additional 327 field isolates surveyed in this study were either isolates collected from California/Arizona between 328 1982 and 2015 or were supplied by Diederik Smilde (Naktuinbouw, The Netherlands). Sexual 329 progeny of SF5 x C82P24 were generated as described previously (40, 41). Single-spore isolates were 330 derived from cotyledons that had been sporulating asexually for 1 to 2 days (6 to 7 days post-331 infection). A single cotyledon was run over a 0.5% water agar plate until clean of spores. Single 332 conidia were located under a dissection microscope, pulled off the agar using pipette tips, and 333 ejected onto fresh, 7-day old cotyledons of cv. Green Towers that had been wetted with a drop of 334 deionized water. Plates were incubated at 15˚C with 12 hour light/dark periods. Successful single-335 spore infections were transferred to cv. Green Towers seedlings and maintained thereon. Fitness 336 was determined by measuring the rate of B. lactucae sporulation of four replicates of four isolates 337 on 20 cotyledons at 3, 5, 6, 7, and 9 days post-inoculation (dpi) on cv. Green Towers. The area under 338 the curve was calculated for each replicate and significance tested using a two-tailed t-test with 339 Holm adjustment. Additional fitness tests of heterokaryons were performed on an F1 hybrid of 340 NumDm15 and R4T57D, which confer resistance phenotypes Dm15 and Dm4, respectively (68). The 341 virulence phenotype was determined by inoculation onto the IBEB EU-B standardized differential set 342 (http://www.worldseed.org/wp-content/uploads/2016/05/ Table-1_IBEB.pdf) and observed for 343 sporulation at 7, 11, 15, and 21 dpi. Microscopy was performed on ~2-week old seedlings of lettuce 344 cv. Green Towers, 5 dpi with B. lactucae isolate C16C1909 (Fig. 8a) or ~2 week old seedlings of 345 lettuce cv. Cobham Green homozygous for the AtUBI::dsRED transgene, 7 dpi with B. lactucae isolate 346 C98O622b (Fig. 8b). Fig. 8a was captured with a Leica TCS SP8 STED 3X inverted confocal microscope 347 using a 40x water immersion objective. Image processing was performed using Huygens Professional 348 (https://svi.nl/Huygens-Professional) and Bitplane Imaris (http://www.bitplane.com/). Fig. 8b was 349 captured using a Zeiss LSM 710 laser scanning confocal microscope using a 40x water immersion 350 objective. Z stacks were processed and combined into a single image using the ZEN Black software. 351 Spore pellets of all isolates sequenced were obtained by washing sporangia from infected lettuce 352 cotyledons in sterile water. Spore suspensions were concentrated by centrifugation in 15 mL tubes, 353 resuspended, transferred to microfuge tubes, pelleted, and stored at -80°C until DNA extraction 354 following a modified CTAB procedure (69). Quantity and quality of DNA was determined by 355 spectrometry as well as estimated by TAE gel electrophoresis. 356 Multiple assembly approaches were tried using a variety of templates. Ultimately, the 390 genome of isolate SF5 was assembled using a hybrid approach using several types of sequences 391 ( Supplementary Fig. 10). Moleculo reads were assembled using Celera (72) and further scaffolded 392 using mate-pair, fosmid-end, BAC-end, and PacBio data utilizing first SSPACE v3.0 (73) followed by 393 AHA (74). A consensus assembly was obtained by removing the second haplotype using 394

Library preparation and sequencing
Haplomerger2 (75). Misjoins were detected and broken using REAPR v1.0.18 in 'aggressive mode' 395 (76). Mitochondrial sequences were detected by BLASTn v 2.2.28 and removed before final 396 scaffolding and gap-filling (73, 77). Hi-C scaffolding was performed by Dovetail Genomics using their 397 Hi-Rise pipeline to infer breaks and joins. One putative effector gene was masked by Ns in the 398 assembly because it was determined by read coverage to be erroneously duplicated multiple times. 399 The quality of the assembly was assessed in multiple ways. Assembly completeness and 400 duplication was measured by BUSCO v2, protist ensemble library db9 (46)   assessing genome assembly and annotation completeness with single-copy orthologs. 598 Bioinformatics 31, 3210-3212 (2015). 599  Black: the distribution of k-mers present in the read set but absent in the assembly. Red: K-mers 752 present in the read set and once in the assembly. Purple: K-mers present in the reads set and twice 753 in the assembly. The first peak depicts heterozygous k-mers and the second peak depicts 754 homozygous k-mers. A high-quality consensus assembly will contain half the k-mers in the first peak, 755 the other half of which should be black due to heterozygosity, and all the k-mers in the second peak 756 should be present only once, which therefore should be red. Very few duplicated k-mers were 757 detected in the SF5 assembly. K-mers derived from repeat sequences have higher multiplicity and 758 are not shown. 759 Phytophthora (P. infestans, P. ramorum, and P. sojae) assemblies. Statistics of these assemblies are 763 included in Table 1. LTR elements of B. lactucae are younger than elements in other downy mildew 764 assemblies. b) Counts of unique LTR-RTs harvested and annotated from each genome surveyed. 765 Larger assemblies (Table 1) are observed as having higher counts of LTR-RTs. Bars are ordered by the 766 percent of the genome masked displayed in panel c. Only partial and no full elements could be found 767 for P. halstedii. c) Scatterplot demonstrating the percentage of the assembly sequence that is 768 masked by annotated LTR-RTs and partial elements. Colors and order are retained from panel b. The 769 percentage of the assembly masked increases with assembly size. B. lactucae is an outlier as it has a 770 medium assembly size, but the highest masked percentage. 771  progeny generated by crossing SF5 (homokaryotic) with C82P24 (heterokaryotic). The first square 791 delineates the majority of the offspring as one group of siblings derived from the same two parental 792 nuclei (homokaryon 1, HK1). The second square delineates the remaining offspring as a second 793 group of siblings derived from a different nucleus in C82P24 (homokaryon2, HK2). Relatedness of 794 these two groups is consistent with having one parental nucleus in common derived from SF5. 795 Relatedness of single-spore asexual derivatives of both isolates is also shown. Single-spore 796 derivatives of C82P24 had a high relatedness to all other C82P24 derivatives and the original isolate. 797 These derivatives and C82P24 were equidistant to all offspring, indicating that both nuclei in the 798 heterokaryon contributed to the offspring and that the heterokaryotic C82P24 isolate had not been 799 separated into homokaryotic components by generating single-spore derivatives. 800 analysis of ten asexual single-spore derivatives of C98O622b placed them into three genomic groups. 802 One group of derivatives, A to F, were heterokaryotic and highly similar to C98O622b. The other two 803 groups, derivatives G to I and derivative J, were each homokaryotic, less similar to C98O622b than 804 the heterokaryotic group was, and even less similar to each other. Combining reads in silico of 805 isolates G to I did not change their relatedness to other isolates; combining reads of any of G to I 806 with J scored similarly high in relatedness to C98O622b as derivatives A to F. (ii) Phenotypic 807 differences between heterokaryotic and homokaryotic derivatives of C98O622b compared to the 808 original isolate. Derivatives A to F were virulent on both Dm4 and Dm15; however, derivatives G to I 809 were avirulent on Dm4 and virulent on Dm15, while derivative J showed the reverse virulence