Genome Analysis of a Zygomycete Fungus Choanephora cucurbitarum Elucidates Necrotrophic Features Including Bacterial Genes Related to Plant Colonization

A zygomycete fungus, Choanephora cucurbitarum is a plant pathogen that causes blossom rot in cucurbits and other plants. Here we report the genome sequence of Choanephora cucurbitarum KUS-F28377 isolated from squash. The assembled genome has a size of 29.1 Mbp and 11,977 protein-coding genes. The genome analysis indicated that C. cucurbitarum may employ a plant pathogenic mechanism similar to that of bacterial plant pathogens. The genome contained 11 genes with a Streptomyces subtilisin inhibitor-like domain, which plays an important role in the defense against plant immunity. This domain has been found only in bacterial genomes. Carbohydrate active enzyme analysis detected 312 CAZymes in this genome where carbohydrate esterase family 6, rarely found in dikaryotic fungal genomes, was comparatively enriched. The comparative genome analysis showed that the genes related to sexual communication such as the biosynthesis of β-carotene and trisporic acid were conserved and diverged during the evolution of zygomycete genomes. Overall, these findings will help us to understand how zygomycetes are associated with plants.

be used to characterize the lifestyles of fungi. Global CAZyme investigations of a kingdom of fungi showed that necrotrophic pathogens have more CAZymes than other ecotypes such as biotrophs and saprobes 29 . Interestingly, biotrophic Ustilago maydis has an extremely decreased number of total CAZymes, but instead the gene clusters of secreted virulence factors were found in its genome. This suggests that CAZymes are not the only factors that determine plant pathogenicity in fungi. Secreted effectors have also been investigated because plant pathogenic fungi interact with host cell death machinery via these effectors 30,31 .
In pathogen-host interaction, genome evolution via sexual communication is important to host adaptation. β -carotene derivatives, particularly trisporoids, have been identified to be responsible for partner recognition and early sexual differentiation in zygomycetes 32 . Three genes, tsp1, tsp2, and tsp3, are associated with trisporic acid biosynthesis that encode 4-dihydromethyltrisporate dehydrogenase, 4-dihydrotrisporin dehydrogenase, and carotene oxygenase, respectively 32 . AcaA encodes an additional carotene cleavage oxygenase acting on the cleavage product of β -carotene made by tsp3-encoded oxygenase in Phycomyces 33 . The biosynthesis of β -carotene is well-conserved in Mucorales and is mediated by two genes, carRA and carB 34 .
Here were report the genome sequence of the plant pathogenic Mucorales fungus Choanephora cucurbitarum. We used ten Mucorales and two Glomerales genomes as comparatives: one Absidia, two Lichtheimia, two Mucor, one Parasitella, two Rhizopus, two Umbelopsis, and two Rhizophagus. We also used the genomes of three plant pathogenic fungi to explore the unique genomic features in the C. cucurbitarum genome in perspective of plant infection strategies; i.e., two hemibiotrophic ascomycetes, Colletotrichum species and one biotrophic basidiomycete, Ustilago maydis.

Results and Discussion
Genome assembly and gene prediction. We sequenced and assembled the genome of C. cucurbitarum KUS-F28377 isolated from green squash in Korea. On the basis of high-quality reads (18.8 million reads with 4.3 billion bases), we assembled a genome of 29.1 Mbp with 2,814 scaffolds. The estimated genome size based on the k-mer frequency was 29.2 Mbp, thereby indicating that 99.8% of the entire genome was covered by the assembly (Supplementary Figure S1). The N50 values for the contigs and scaffolds were 24.2 kbp and 27.9 kbp, respectively, and the sequence coverage was 81.3-fold. The read-depth coverage and GC-content profile for each scaffold showed no indication for the sequences of contaminants, symbionts, or parasites in the final assembly (Supplementary Figure S2). The genome sizes of Mucorales varied, ranging from 21.9 Mbp (U. isabellina B7317) to 49.6 Mbp (R. microsporus), implying that genomic gain and loss have extensively occurred during the evolution of Mucorales genomes. It should be noted that the taxon R. delemar has been integrated into R. arrhizus as a synonym (http://www.indexfungorum.org/names/NamesRecord.asp?RecordID= 162898), but we used R. delemar to refer to this species in this study.
In total, 11,977 protein-coding genes were predicted in the C. cucurbitarum genome with an average length of 1,194 nt. The numbers of predicted genes in Mucorales genomes also varied, ranging from 9,082 (U. isabellina B7317) to 22,427 (R. microsporus). The protein length distribution indicated a slight expansion of short proteins around 100 amino acids (aa) (Supplementary Figure S3). The completeness of the genome assembly and gene predictions were estimated using Benchmarking Universal Single-Copy Orthologs (BUSCO) 35 , which indicated that 97.3% (1,399 entries) of all 1,438 single-copy orthologs were detected in this genome. Table 1 Figure S4). The degree of gene sharing was consistent with the relationships in the phylogenetic tree shown in Fig. 1 (Supplementary Table S2 shows the counts for the shared gene families in all 13 genomes). The number of members per gene family provided little evidence of the whole-genome duplication events, as found in Rhizopus genomes 19 , which was consistent with the smaller genome size of the C. cucurbitarum genome (Supplementary Figure S4 and Supplementary Table S3). The C. cucurbitarum genome contained 1,326 unique genes (11.11% of overall genes) where their orthologs were absent in any of the comparative genomes used in this study, and they were designated as "orphan genes". Among 1,326 orphan genes, 1,113 genes belonged to single-member gene families and 213 genes belonged to multi-membered gene families. The second largest gene family for the orphan genes comprised 31 members containing the F-box domain (PF12937), which is related to the ubiquitin proteolysis mechanism 36 . Five orphan genes were annotated with CAZymes: CE4 (acetyl xylan esterase, EC 3.1.1.72), CBM14 (chitin-binding module), GH31 (α -glucosidase, EC 3.2.1.20), GH37 (α ,α -trehalase, EC 3.2.1.28), and AA6 (1,4-benzoquinone reductase, EC. 1.6.5.6) modules. These unique CAZymes may determine the host specificity of this organism. Interestingly, the orphan proteins had shorter lengths than those in the overall proteome, where the median lengths of the orphan proteins and all proteins were 107 aa and 318 aa, respectively (Table 2 and Supplementary Figure S5). This bias still needs to be explained. GO annotations showed that the functions of "protein binding, " "oxidation-reduction process, " "ATP binding, " "integral component of membrane, " "serine-type endopeptidase inhibitor activity, " and "metal ion binding" were frequent in these orphan genes.
To identify effector candidates, we predicted 657 genes containing a signal peptide for transport outside of cells. In particular, 97 secreted protein-encoding genes were orphan genes that could potentially function as host-specific effectors, where 84 of these lacked any known functional domain, which is consistent with the current rule for defining effector genes as secreted proteins without known functional domains 30 . The other 13 genes included protease inhibitor domain-and catalase domain-containing genes involved in defense against plant immunity.
Comparative genome analysis showed a various number of orphan genes, ranging from 188 to 8,396, which represents the diverse lifestyles of Mucorales fungi adapting to ecological niches. Orphan genes were often highly duplicated. For instance, M. ambiguus genome had an orphan gene family with 69 members. Majority of duplicated orphan genes were not annotated with Pfam or CAZymes. The ortholog groups and the functional annotation of the orphan genes are listed in Supplementary Tables S3 and S4, respectively. Intriguingly, secreted proteins were more likely to be orphan proteins in the C. cucurbitarum genome (P value = 0.002 in the chi-square test). This finding suggests that effector genes are under strong selective pressure, which leads to more variation than other genes 30 . This observation was also identified in the comparative genomes except in R. irregularis DAOM 181602 and U. isabellina NBRC 7884. The orphan genes and statistical tests for secreted orphan proteins are summarized in Table 2 and Supplementary Table S6.

Pfam domain annotation and bacterial Streptomyces subtilisin inhibitor (SSI)-like domain.
We used Pfam annotations to identify unique or enriched functional domains in the C. cucurbitarum genome. We identified 12,767 Pfam domains within 7,936 genes, which was lower than the numbers in the comparative genomes (11,810-28,947 domains, Supplementary Table S7). However, the numbers of unrepeated Pfam entries were similar (3,504 for C. cucurbitarum and 3,249-3,676 for the others, Supplementary Figure S6). Indeed, around 65% of the Pfam entries (2,284 entries) were shared by all 13 genomes. In addition, the most frequent Pfam domains were common in all 13 genomes: "protein kinase domain," "WD domain," "RNA recognition motif, " "major facilitator superfamily, " and "helicase conserved C-terminal domain" (Supplementary Figure S7). These observations suggest that they share most functions and that small variations determine their lifestyles. The C. cucurbitarum genome contained 14 unique, 11 enriched, and 18 depleted Pfam entries (P < 0.01, Fisher's exact test) compared with the other 12 genomes (Fig. 2).
The most notable enriched domain was the SSI-like domain (PF00720), where 11 copies were present (Fig. 2). It has been reported that only Actinobacteria harbor this domain, which is found mostly in Streptomyces spp. 37 . We found that only one other Mucorales fungus, Absidia glauca, contains three SSI-like proteins among all the fungal genomes deposited in NCBI. The lengths of the C. cucurbitarum SSI-like proteins were relatively short, ranging from 109 aa to 138 aa, and they were encoded by a single exon, except for one containing two exons. Only 6/11 SSI-like genes were supported by mRNA-seq in the hyphal state, which indicates that they might be differently regulated. Eight of the SSI-like proteins had extracellular localizations in the same manner as bacterial homologs (6/8 bacterial SSI-like proteins investigated). The three non-extracellular SSI-like proteins were destined for the endoplasmic reticulum, mitochondria, and cytosol, respectively, thereby suggesting that they may have various roles in cells.
There were clade-specific conserved positions as well as universally conserved positions in SSI-like protein sequences. We aligned a total of 19 SSI-like protein sequences (11 from C. cucurbitarum and eight from bacteria) and identified 35 conserved positions in all 19 sequences. The 16 and 12 positions were conserved only in C. cucurbitarum and bacteria, respectively (Fig. 3a). Four cysteine positions that form two disulfide bonds were well conserved in all sequences (positions 107, 122, 144, and 175 in Fig. 3a), which indicates that these disulfide bonds are important for correct structure formation in both groups 37 .
We examined the phylogenetic relatedness between C. cucurbitarum and bacterial SSI-like domains by building phylogenetic gene trees. The neighbor-joining tree showed the nestedness among the SSI-like domains of C. cucurbitarum and three bacterial species: Thermobifida, Actinosynnema, and Nocardia ( Fig. 3b and Supplementary Figure S8). The maximum-likelihood tree built with the same dataset also supported similar grouping pattern (Supplementary Figure S9). Despite that the bootstrap values of branching nodes were weak in the trees to pinpoint the exact phylogenetic relationship, the tree analysis implied that SSI-like domains of C. cucurbitarum have been horizontally transferred. These SSI-like paralogs probably help to neutralize the host's immunity because subtilisin proteases have functions in pathogen recognition in plants 38 . Distributions of CAZymes and the CE6 family. The C. cucurbitarum genome contained fewer CAZymes in total than dikaryotic necrotrophs. This genome contained 312 CAZymes whereas other dikaryotic necrotrophs

Attributes
Ccu  contained 419-865 CAZymes with an average of 605 as previously reported 29 . This number is lower than the average number of CAZymes in hemibiotrophs (542 CAZymes) and is similar to that found in symbiotic and biotrophic plant pathogens (328 and 313 CAZymes, respectively). Plant pathogens have been known to have more CAZymes than saprobes and biotrophs 29,39 because they need to degrade plant biomass to facilitate their invasion. Thus, this suggests that not all plant pathogens have expanded CAZyme modules and different plant pathogenicity factors, such as secreted effector proteins 40 , would lead to dissimilar CAZyme profiles. Interestingly, a decreased number of total CAZymes was also observed in other Mucorales with an average of 342 CAZymes, except for R. microsporus with 585 CAZymes, which might be attributable to whole-genome duplication ( Fig. 4 and Supplementary Table S8). The carbohydrate esterase family 6 (CE6, acetyl xylan esterase) was enriched in the C. cucurbitarum genome with seven copies, whereas the other Mucorales had one or two copies of each. This family is only conserved in zygomycetes among all fungal clades 29 , but it is also found in bacterial and plant genomes (http://www.cazy.org/ CE6.html). The CE6 family is annotated with "acetyl xylan esterase" and it improves the saccharification of stem lignocellulose by deacetylating xylan 41 . The gene tree of CE6 family indicated that the fungal CE6 genes were more diverged from the common ancestor of plants and the bacterial CE6 family (Supplementary Figure S10).

Pathogen-host interactions and bacterial virulence genes. The genes involved in pathogen-host
interactions (PHIs) were inferred by identifying their homologs in the PHI-base database 42 . Among 11,977 predicted genes, 2,277 (19.01%) genes shared similarity with the PHI genes. In particular, 1,504 genes were related to plant hosts (monocots or eudicots) and there were fewer of these genes than those in the comparative genomes (1,362-2,967 genes, Table 3 and Supplementary Figure S11) and two Colletotrichum genomes (2,395 and 2,972 PHI genes in C. graminicola and C. higginsianum, respectively), which may be attributable to its small genome size. In total, 226 PHI entries were shared by all 16 genomes (C. cucurbitarum, ten Mucorales, two Glomerales, two Colletotrichum, and Ustilago). These common virulence genes determine the core plant-infecting factors, and they are not specific to hosts. Four known effectors were found in the C. cucurbitarum genome: two copies of PHI:2703 (Xanthomonas axonopodis-derived citrus canker causing effector), one copy of PHI:2347  Clustered nodes were collapsed for clarity. Each node is labeled with UniProt ID with corresponding organism name in brackets. The tree was built using FastTree with the default option.
(grey mould-causing effector), and one copy of PHI:4506 (Epichloe festucae-derived effector). Genes involved in host-interactions in C. cucurbitarum still need to be fully elucidated, so we expect that many genes were missed in this investigation.

Sexual communication genes in the Mucorales and Glomerales genomes. The analysis of tris-
poroids and β -carotene biosynthesis-related genes revealed the conservation and divergence of these genes over Mucorales and Glomerales genomes. As shown in Fig. 5, we investigated four genes related to trisporoids biosynthesis (tsp1, tsp2, tsp3, and acaA) and two genes related to β -carotene biosynthesis (carB and carRA). Tsp3, which encodes carotene oxygenase catalyzing β -carotene cleavage, was well-conserved in the Mucorales genomes as a single copy with an exception of two copies in the R. microsporus genome. The acaA gene had high sequence similarity with tsp3 (33% identity and 50% similarity in Phycomyces proteins), therefore the ortholog-finding method was applied to differentiate the two genes. Interestingly, tsp1 (4-dihydromethyltrisporate dehydrogenase) and tsp2 (4-dihydrotrisporin dehydrogenase) had various degrees of duplications over the genomes ranging from one to eight copies. The phylogenetic tree showed that tsp1 genes were divided into three groups (Supplementary Figure S12), suggesting that tsp1 genes were duplicated for functional divergence in contrast to tsp2 genes (Supplementary Figure S13). Two R. irregularis genomes lacked both tsp3 and acaA genes, but contained tsp1 and tsp2 genes. This observation suggests the synthesis of signal compounds by non-β -carotene pathway, which is also supported by an absence of carB and carRA orthologs in these genomes.

Conclusion
Here were report the genome sequence of C. cucurbitarum KUS-F28377, a pathogen of cucurbits and other plants isolated in Korea. We determined the functional profiles of the C. cucurbitarum genome and compared them with the relatives and other plant pathogenic fungal genomes to identify the genetic features responsible for the mechanisms of plant pathogenicity unique to this genome. In conclusion, exploring the pathogenicity of C. cucurbitarum using genomics can provide crucial insights, which may facilitate the development of disease control strategies.

Methods and Materials
Strain isolation and genomic DNA/RNA preparation. The strain C. cucurbitarum KUS-F28377 (Korean Agricultural Culture Collection No. 48046, http://www.genebank.go.kr/eng/about/aboutus.jsp) was isolated from Cucurbita moschata. Cultures were grown for 10 days on potato dextrose agar (PDA) at 25 °C and the fungal mycelium was harvested. Genomic DNA was extracted by following the protocol provided by Kohler et al. 43 . Genome sequencing and assembly. In total, 10 million read pairs (300 bp per read and 2.79 billion bases) were generated by Illumina MiSeq paired-end sequencing. The raw reads were trimmed and filtered on the basis of sequence quality (20 Q-score cutoff) and length (20 bp cutoff) using HTQC 1.92.3 44 . A preliminary assembly was generated by ABySS 1.9.0 45 with the options of k-mer = 41 and insert size = 400 bp. The assembled contigs were used to construct a simulated jumping library with wgsim (https://github.com/lh3/wgsim), which generated 4.5 million read pairs with an insert size of 3 kbp without errors (-e 0 -d 3000 -s 300 -N 5000000 -1 150 -2 150 -r 0 -R 0 -X 0). Finally, 2,814 scaffolds were assembled by Allpaths-LG r47300 46 with 86.2× sequence coverage. Scaffolds less than 1 kbp were excluded. The potential sequences of contaminants, symbionts, or parasites was checked with Blobology 47 , which plots read-depth coverage calculated by read alignment using Bowtie2 48 and GC content of each scaffold.