Genome analysis of the kiwifruit canker pathogen Pseudomonas syringae pv. actinidiae biovar 5

Pseudomonas syringae pv. actinidiae (Psa) is a destructive pathogen of kiwifruit bacterial canker disease, causing severe economic losses to kiwifruit industry worldwide. Biovar 5 is the most recently reported biovar of Psa, and is found in only a local area of Japan at present. There is not much information of genetic characteristics of biovar 5. Thus, the genome of biovar 5 was sequenced and analyzed to clarify its detailed genetic characteristics. Here, the genomes of strain MAFF 212056 and MAFF 212061 of biovar 5 were estimated to be about 6.3 Mbp and 6.5 Mbp, respectively, and their phylogenetic positions were proved to be near that of biovar 2 in the phylogenetic tree. However, it was confirmed that biovar 5 had neither the coronatine biosynthetic genes conserved in biovar 2, its phylogenetic neighbor, nor the phaseolotoxin biosynthetic genes conserved in biovar 1, Japanese native pathogen. In addition, 45 genes of type III secreted effectors were identified in biovar 5 genomes, showing that their composition is different from that in the other biovars. Moreover, some biovar 5-specific regions were identified. Then, biovar 5-specific PCR primers for targeting these regions were designed, and proved to be applicable for detecting biovar 5 specifically.


Results
Genome information of biovar 5. The genomic DNAs of strains MAFF 212056 (http://www.gene.affrc. go.jp/databases-micro_search_detail_en.php?maff= 212056) and MAFF 212061 (http://www.gene.affrc.go.jp/ databases-micro_search_detail_en.php?maff= 212061), which were isolated from Saga Prefecture in 2012, were sequenced as representative biovar 5 strains. By using the next generation sequencer Ion PGM system (Life Technologies, Thermo Fisher Scientific Inc., Waltham, MA), 3,835,257 single reads of an average length of 189.9 bp, and 6,425,035 single reads of an average length of 189.4 bp were obtained respectively in MAFF 212056 and MAFF 212061. Genome de novo assembly was performed using the CLC Genomics Work bench (CLC Bio, Qiagen, Valencia, CA). After filtering with a Phred score cutoff of ≤ 20, 291 contigs of > 500 bp with an N 50 of 50,639 bp in MAFF 212056 and 687 contigs of > 500 bp with an N 50 of 50,026 bp in MAFF 212061 were obtained (Suppl. Doc 1 and 2). Consequently, the genome size (the assembled sequence size) of MAFF 212056 was estimated to be 6,342,665 bp, corresponding to a 126.4-fold genome coverage and that of MAFF 212061 was estimated to be 6,499,176 bp, corresponding to a 173.8-fold genome coverage. The both genomes have G + C content of 58.5%. The assembled data of MAFF 212056 were subjected to DDBJ MiGAP, and 6,330 protein-coding sequences (CDSs), 4,055 ribosomal binding sites, 55 tRNA genes, and 4 rRNA genes were identified. The sequences of some housekeeping genes of MAFF 212056, such as acnB, cts, gapA, gyrB, pfk, pgi, and rpoD, and 16S-23S rDNA ITS, which have been widely used as markers for phylogenetic analysis, were confirmed to be identical to those of MAFF 212061 and also to those of reported in previous studies 4,7 . Because the entire sequences of these housekeeping genes could be secured from our genome data, and the number of CDSs (6,330) predicted from our genome data proved to correspond to those of Psa (between 5,200 and 6,520) predicted from published genome database (e.g., 5,259 CDS, 5,761 CDSs, and 6,520 CDSs in the case of ICMP 9617, ICMP 18884 and ICMP18886, respectively) 8 , we have decided that our genome data have enough quality for further analysis. ANI analysis. DNA-DNA hybridization (DDH) experiments have been performed to determine genetic similarity between bacteria since the 1960s. Even if there is no information on DNA sequences, DDH experiment can reveal the similarity between compared bacteria. However, nowadays, genomic sequences can be acquired easily, so DDH in silico using genomic sequences has become more practical than conventional DDH experiment. As the representative of such in silico analyses, the average nucleotide identity (ANI) assay is often used 9 . The ANI value is calculated as the mean of identity by BLASTn matches between the virtually fragmented query genome and the reference genome. Hence, ANI can be used as a good tool for phylogenetic analysis at the genomic level. After obtaining the genomic sequences of biovar 5 in this study, ANI analysis of all Psa biovars including biovar 5 became possible for the first time.
ANI values among Psa biovars were obtained using ANI calculator analysis (Suppl. Table 1), showing that the value between strains MAFF 212056 and MAFF 212061 of biovar 5 was approximately 100%. Then, a dendrogram of relatedness by using ANI values was constructed on the basis of UPGMA, which shows that strains MAFF 212056 and MAFF 212061 clustered tightly and independently of the other Psa biovars, among which biovar 2 was the nearest neighbor to biovar 5 ( Fig. 1). Of the existing Psaf lineages (lineages 1 to 4), lineage 1 and lineage 3 were included in this analysis, which grouped together and were entirely separated from Psa biovars 1, 2, 3 and 5. Similar topologies were obtained when other algorithms (complete-linkage clustering method, nearest-neighbor chain algorithm method, and ward method) were used (data not shown).
Toxins and effector genes. The results of ANI analysis ( Fig. 1) and MLSA 4 indicate that biovar 5 is the nearest neighbor to biovar 2. However, unlike biovar 2, cfl gene of the coronatine gene cluster was not amplified from biovar 5 DNA by using PCR 4 . In addition, some components of the tox island, a genomic island which contains argK-tox cluster (phaseolotoxin synthesis gene cluster) 10 , i.e., amtA, desl (ptx), argD (ORF3), and argK, conserved in most biovar 1 strains 4,10,11 , were also not amplified from biovar 5 4 . The type III secreted effector (T3SE) genes, whose translated products are translocated to host plant cells via the type III secretion system, are involved in virulence and/or avirulence 12 . However, the comprehensive composition of T3SE genes for biovar 5 has not yet been clarified. Thus, comparative genome analysis using the Mauve tool and virulent genes search by tBLASTx were performed to confirm the absence of these toxin genes and predict the effector genes in biovar 5 genomes.
When compared with the biovar 1 genome, no homologs of the tox island including argK-tox cluster were found in the homologous site of the MAFF 212056 genome, and only the flanking regions of the tox island in the biovar 1 genome were conserved continuously in the MAFF 212056 genome (Fig. 2). The phaseolotoxin markers, amtA, desl (ptx), argD (ORF3), and argK of the argK-tox cluster 4,10 , were entirely absent in the whole of the MAFF 212056 genome. In addition, since the ordering and orientation of contigs could be inferred using the Mauve tool, the contigs of MAFF 212056 was conveniently concatenated and a single draft genome of MAFF 212056 could be generated. Comparing a draft genome of MAFF 212056 with the biovar 1 genome (ICMP 9617) using the Artemis comparison tool (ACT), it was found that overall structure of the MAFF 212056 genome generally resembled that of biovar 1, but that only the latter possessed the tox island containing argK-tox cluster (Suppl. Fig. 1). Similarly, no homologs of the tox island containing argK-tox cluster were found in the MAFF 212061 genome of biovar 5 (data not shown).
Next, when compared with the biovar 2 genome, no homologs of the coronatine gene cluster were found in the MAFF 212056 genome of biovar 5 (Fig. 3). Also, MAFF 212061 genome were found to possess no homologs of the coronatine gene cluster (data not shown).
Further, by comparisons among the Psa genomes using the Mauve tool and/or tBLASTx, 12 T3SE genes following hopAB3 listed in Table 1 were newly found in Psa genomes in this study, in addition to 51 T3SE genes,  which were reported previously as the total number of T3SE genes of Psa 8 . The total of 63 T3SE genes predicted in this study in conjunction with the previous report 8 were highly homologous to the corresponding genes of ICMP 9617 (a biovar 1 strain) and ICMP 18884 (a biovar 3 strain), whose genomes were completely sequenced (Suppl. Table 2).
The 17 T3SE genes, which were predicted to be conserved in all the other Psa biovars (biovars 1, 2, 3) and Psaf by McCann et al. 8 , were also found to be present in the biovar 5 genomes. In total, the 45 T3SE genes were predicted to be located in the biovar 5 genomes (Table 1). Their composition was confirmed to be identical between MAFF 212056 and MAFF 212061 genomes. Comprehensively, it seems that the set of T3SE genes in biovar 5 resembled the set in biovar 1 or biovar 2.
On the basis of tBLASTx, hopL1 was predicted to be present only in biovar 5 and Psaf. Thus, to further confirm whether Psa biovars and Psaf possess hopL1 or not, PCR analysis was performed using Psa genomic DNA as templates with the hopL1-specific primers designed in this study ( Table 2). As a result, a single band was obtained from the DNAs of biovar 5 and Psaf (Fig. 4), whereas no band was obtained in the case of DNA from biovars 1, 2, and 3, which endorsed the prediction of our comparative genomic analysis.
Biovar 5-specific primers. It may be possible to use hopL1 as a biovar 5 marker; however, this marker cannot discriminate between biovar 5 and Psaf. Thus, to detect only biovar 5 certainly, biovar 5-specific sequences, whose highly homologous regions were not found in biovars 1, 2, 3, Psaf, Pss B728a, Pst DC3000, and P. s. pv. phaseolicola 1448A, were sought using comparative genomic analysis with the Mauve tool. Then, appropriate loci from only five contigs of MAFF 212056 (Contig_002, Contig_034, Contig_044, Contig_047, and Contig_067) were obtained (Suppl. Table 3), and biovar 5-specific primers were designed on the basis of their sequences ( Table 2). All primer sets were confirmed to induce biovar 5-specific amplification by using PCR (Fig. 5). Electrophoresis showed the amplicons derived from biovar 5 templates to be single bands. However, Con047F/Con047R primers, targeting the internal sequence of Contig_047, induced nonspecific bands for biovar 1 templates. The size of the nonspecific bands derived from biovar 1 was close to that of the true amplicon from biovar 5 (Fig. 5D). Thus, four primer sets, except Con047F/Con047R primers (Table 2), proved to be applicable to biovar 5-specific detection.

Discussion
Biovar 5 strains of Psa were found in 2012 for the first time, and their genetic diversity, toxin productivity, and various phenotypic characteristics were elucidated 4 . This biovar was found in only a local area (Saga Prefecture) of Japan, at the time of June 2015 6 . Thus, biovar 5 may be a domestic or endemic pathogen of kiwifruit and other relevant species. By contrast, biovar 3, a pandemic pathogen, could cause a global outbreak 1,3,8 . Genotypic characteristics of the known Psa biovars except biovar 5 have been investigated in detail 1,5,6,8,10,11 , whereas there is lesser information on biovar 5. Therefore, we cannot clearly explain the differences or similarity among the biovars, including biovar 5. In this study, we performed genome sequencing and comparative analysis to obtain detailed genetic information on biovar 5.   The representative strains MAFF 212056 and MAFF 212061 of biovar 5 are found to have genome size of ca. 6.3-6.5 Mbp and more than 6000 CDSs by using genome assembly and gene annotation. In addition, ANI analysis, which corresponds to DDH in silico analysis, indicates that the phylogenetic position of biovar 5 is close to that of biovar 2 at the whole-genome level (Fig. 1). A phylogenetic tree of all biovars was constructed using MLSA of seven housekeeping genes, showing that biovar 5 strains were placed adjacent to biovar 2 strains 4 . Thus,      ANI analysis results obtained in this study are consistent with the MLSA results. However, biovar 5 does not possess the coronatine gene cluster, unlike biovar 2 (Fig. 3). As for T3SE composition, biovar 5 is different from the other biovars, including biovar 2 (Table 1). Moreover, biovar 5 was found to differ from biovar 2 with respect to biochemical characteristics, for example, biovar 5 can hydrolyze aesculin, produce strong fluorescent pigment on King's B medium, and cannot utilize D-mannitol 4 . These results support the conclusion of Sawada et al. 4 that biovar 5 should be regarded as an independent group separated from biovar 2. Hereafter, much effort including genome sequencing and phenotypic characterization of more strains of biovar 5 and biovar 2 will be necessary for unambiguous definition of biovar circumscription.
The tox island containing argK-tox cluster is conserved in most biovar 1 strains 4,10,11 , and the coronatine gene cluster is considered to be conserved in biovar 2 strains 1 ; however, biovar 5, unlike biovars 1 and 2, is proved to have none of these toxin gene clusters. Thus, it is assumed that biovar 5 may produce unknown virulence factors including toxins. In fact, we have estimated that a locus in Contig_047 of MAFF 212056, which is a biovar 5-specific region, might be involved in the synthesis of some secondary metabolites. At this locus, a gene cluster ca. 20 kb, which contains some enzyme genes catabolizing fatty acids and polyketides, was found. This composition of the gene cluster resembled that of other toxin gene clusters such as coronatine 13 . Certainly, further studies are required to confirm whether this region is involved in novel toxin production.
For many plant pathogenic bacteria, not only toxins but also various effectors involved in host interaction are essential to establish infection and cause diseases. Various genes that encode secreted proteins and secretion apparatus, such as the type I, II, III, IV, and VI secretion system, were found in the biovar 5 genome. Their functions and roles need to be studied in the future.
Of various secreted proteins, T3SEs have been the most noted targets. T3SEs are significant proteinous factors for host interaction, and they work as virulence factors in susceptible hosts or as avirulence factors in resistant hosts with corresponding resistant genes 12 . The functions in planta and biophysical and biochemical actions of many T3SEs, especially from Pseudomonas spp., have been analyzed 12 . Consequently, the roles of T3SEs in Pseudomonas pathogens have become revealed considerably. By analogy with them, Psa is also assumed to use T3SEs for the establishment of infection in kiwifruit. Here, by comparative genome analysis in conjunction with the results of McCann et al. 8 , 63 T3SE genes were confirmed in Psa biovars in total, out of which, forty-five genes were found in the biovar 5 genomes (Table 1). These genes are highly homologous to the known corresponding T3SE genes of other biovars and species.
Here, the homolog of hopL1 was found in only biovar 5 and Psaf (Table 1), which was predicted to be homologous to hopL1 of Pseudomonas avellanae BPIC 631, a causal pathogen of bacterial canker in hazelnut (Corylus avellana L.), by using MiGAP annotation and tBLASTx. In addition, genome sequence information showed that some Pseudomonas plant pathogens (e.g., P. cichorii JBC1, P. syringae pv. tomato DC3000, and P. s. pv. syringae B728a) possess the hopL1 homologs. However, the function of the HopL1 protein in host plants is unknown. Field survey and inoculation test have suggested that the pathogenicity of biovar 5 and Psaf may be inferior to that of biovars 1, 2, and 3 4,5 . Thus, it is assumed that hopL1 may be related to the reduction in pathogenicity of biovar 5 and Psaf. Meanwhile, in the biovar 5 genomes, the avrD1 homolog gene was found (Table 1). Because avrD in various Pseudomonas pathogens encodes the syringolide family glycolipid elicitor, which induces gene-for-gene resistance to host plants 14 , biovar 5 may have the ability for syringolide-inducible host interaction. Hereafter, function of these T3SEs of biovar 5 should be investigated further on a genetic and physiological basis.
We designed five biovar 5-specific primer sets (Table 2). Of these primer sets, at least four sets (i.e., Con002F/R, Con034F/R, Con044F/R, and Con067F/R) are proved to be applicable for biovar 5-specific detection. When diagnosis to examine whether biovar 5 strains exist in samples such as pollen or other plant organs is required, plant quarantine inspectors or researchers will be able to perform PCR by using any of these primer sets. Psa is proved to be highly heterogeneous and divided to biovars 1, 2, 3, and 5, whose virulence against kiwifruit is quite different one another 1,4-6 . Thus, a clear distinction is required in the field and quarantine for implementing control measures quickly and efficiently. Biovar 1-specific primers 11 , biovar 2-specific primers 15 , and biovar 3-specific primers 6,16 have been developed, in addition to all Psa-universal primers 17,18 . Therefore, the biovar 5-specific primers developed in this study (Table 2) are essential to complement the diagnostic procedure of kiwifruit bacterial canker disease. Hereafter, biovar 5 can be specifically detected by using these primer sets, and prompt disease control can be implemented.

Methods
Bacterial strains, culture, and DNA extraction. The bacterial strains and their relevant characteristics are listed in Table 3. Bacteria were routinely cultured on Luria agar plates or potato dextrose agar plates at 27 °C. Genomic DNA from bacteria were extracted using the InstaGene Matrix (Bio-Rad, Hercules, CA) or DNeasy Plant mini kit (Qiagen), according to the manufacturer's instructions.
Genome sequencing and annotation. The strains MAFF 212056 (http://www.gene.affrc.go.jp/ databases-micro_search_detail_en.php?maff= 212056) and MAFF 212061 (http://www.gene.affrc.go.jp/ databases-micro_search_detail_en.php?maff= 212061) were chosen as the representative biovar 5 strains. These genomic DNAs were processed to template samples by using the Ion Plus Fragment Library Kit and Ion PGM Template OT2 400 Kit with Ion OneTouch 2 System (Life Technologies, Thermo Fisher Scientific Inc.). Then, the template samples of biovar 5 were sequenced using the Ion Sequencing 400 Kit and a 318 Chip with the next generation sequencer Ion PGM (Life Technologies, Thermo Fisher Scientific Inc.). Sequence data were assembled and analyzed using the CLC Genomic Work Bench (CLC Bio, Qiagen). The genome sequence of biovar 1 strain ICMP 9617 (GenBank accession number: CM002753) was used as the reference for mapping and assembly of sequence reads. Table 3. Bacterial strains used in this study. *Psa "biovar 4" had been transferred to the new pathovar actinidifoliorum (Psaf) 5 , which is now divided into four lineages (lineages 1 to 4) (lineages in Psaf are considered to be equivalent to biovars in Psa).
Scientific RepoRts | 6:21399 | DOI: 10.1038/srep21399 Comparative genome analysis. To find the virulent genes and identify biovar 5-specific sequences, biovar 5 contigs were aligned with the following reference sequences: ICMP 9617 (CM002753) and ICMP 9855 from biovar 1 strains, ICMP 19072 and ICMP 19073 from biovar 2 strains, ICMP 18801 and ICMP 18884 (CM002751) from biovar 3 strains, ICMP 18804 from Psaf (lineage 1) strain, ICMP 18807 from Psaf (lineage 3) strain, and Pss B728a, Pst DC3000, and P. s. pv. phaseolicola 1448A (NC_005773) using Mauve, which is a multiple genome alignment tool (http://darlinglab.org/mauve/mauve.html) 19 . On the basis of the aligned sequences, highly homologous regions or solitary regions between biovar 5 and the references were observed, and biovar 5-specific sequences were collected. In addition, the ordering and orientation of contigs were estimated using the Mauve tool, and the concatenation of contigs of MAFF 212056 was performed to generate a virtual single genome sequence. Then, a draft genome of MAFF 212056 was compared with the biovar 1 genome (ICMP 9617) using the Artemis comparison tool (ACT) (http://www.sanger.ac.uk/science/tools/artemis-comparison-tool-act) 20 .
Toxin synthesis genes and type III secreted effector (T3SE) genes were considered as representative virulent genes, and the homologs of such virulent genes were identified by tBLASTx (E value < 1 e −10 ) using candidate CDSs obtained from biovar 5 genomes as query sequences against NCBI database. Moreover, according to the method of McCann et al. 8 , the homologs were also identified and/or confirmed by tBLASTx (E value < 1 e −5 ) using representative virulent genes obtained from a public database (http://pseudomonas-syringae.org) as query sequences against biovar 5 genomes. Partial hits or truncated/disrupted sequences were indicated as 'incomplete' . The translational identity of T3SE genes present in biovar 5 genomes was confirmed by comparing with the corresponding genes of a biovar 1 strain (ICMP 9617) or a biovar 3 strain (ICMP 18884).