Candidate gene mapping identifies genomic variations in the fire blight susceptibility genes HIPM and DIPM across the Malus germplasm

Development of apple (Malus domestica) cultivars resistant to fire blight, a devastating bacterial disease caused by Erwinia amylovora, is a priority for apple breeding programs. Towards this goal, the inactivation of members of the HIPM and DIPM gene families with a role in fire blight susceptibility (S genes) can help achieve sustainable tolerance. We have investigated the genomic diversity of HIPM and DIPM genes in Malus germplasm collections and used a candidate gene-based association mapping approach to identify SNPs (single nucleotide polymorphisms) with significant associations to fire blight susceptibility. A total of 87 unique SNP variants were identified in HIPM and DIPM genes across 93 Malus accessions. Thirty SNPs showed significant associations (p < 0.05) with fire blight susceptibility traits, while two of these SNPs showed highly significant (p < 0.001) associations across two different years. This research has provided knowledge about genetic diversity in fire blight S genes in diverse apple accessions and identified candidate HIPM and DIPM alleles that could be used to develop apple cultivars with decreased fire blight susceptibility via marker-assisted breeding or biotechnological approaches.

Scientific RepoRtS | (2020) 10:16317 | https://doi.org/10.1038/s41598-020-73284-w www.nature.com/scientificreports/ amplicons to cover entire gene sequences using a custom python script (available at https ://bitbu cket.org/miche lettd /ampse q-prime r-desig n). The primers were designed using Primer3 (https ://prime r3.ut.ee, version 4.1.0) to amplify 150 ± 10 bp of fragments of the coding sequences from HIPMs and DIPMs. Besides the use of default parameters, all the primers were blast-searched against the GDDH13 v1.1 43 reference genome to exclude putative repeat regions. The known SNPs were masked in the target regions to avoid annealing problems. On average, five primer pairs were selected for each gene in order to amplify gene fragments at intervals of approximately 300-600 base pairs via AmpSeq genotyping (see below) (Supplementary Table S2).
Genomic DNA extraction and AmpSeq genotyping. Total genomic DNA from Malus accessions of Collection 2 (Supplementary Table S1) was extracted from young fresh leaves using the Wizard Genomic DNA Purification Kit (Promega, USA), according to the manufacturer's instructions. The concentration of extracted DNA was analyzed with a NanoDrop Spectrophotometer (Thermo Fisher Scientific, Gand Island, NY, USA) and the quality of DNA samples was assessed by running a 0.8% agarose gel at 80 V for 1.5 h. Genomic DNA was amplified with the previously designed and selected primer pairs (Supplementary Table S2) to amplify amplicon pools, which were sequenced on the Illumina MiSeq Platform (Illumina, San Diego, CA, USA), according to the AmpSeq protocol at the Genomics Platform Facility of Cornell University (Ithaca, NY, USA) 44 .
Sequence analysis and variant detection. The sequences from AmpSeq run were demultiplexed using the barcode sequences. The raw sequences from 93 AmpSeq (Collection 2) were quality assessed using fastqc tool (https ://www.bioin forma tics.babra ham.ac.uk/proje cts/fastq c/) and reads were trimmed and filtered to remove sequencing barcodes and low-quality bases using Trimmomatic software with the parameters: Trailing 20, Leading 20, SlidingWindow 4:15, AvgQual 20, MinLen 25 45 . For the remaining 145 accessions, wholegenome sequencing data were available, and the reads were filtered for quality as described above. The remaining high-quality reads were aligned against the GDDH13 v1.1 43 genome with burrows-wheeler aligner (BWA) with default parameters. PCR duplicate reads were removed with Picard tools 43 . Variants were identified using the HaplotypeCaller function of the Genome Analysis Toolkit (GATK version 4.1) using default parameters to detect SNPs and short Indels across the HIPM and DIPM gene sequences. The final SNP dataset was partitioned into three different datasets for (1) 145 Malus samples (Collections 1), (2) 93 AmpSeq samples (Collection 2), and (3) sum of the two collections. The variants were further filtered using a mean read depth threshold > 3. The variants were annotated using the coding and protein sequences of HIPM and DIPM genes from the GDDH13 v1.1 43 genome. In addition, the gene annotations were obtained from the GDDH13 v1.1 43 genome to annotate variants using the ANNOVAR program 46 .
Assessment of genomic diversity. The genomic diversity across HIPM and DIPM gene sequences was determined by calculating nucleotide diversity (π) and fixation index (Fst) statistics across the three SNP datasets with the program VCFtools 47 . In addition, the population genetic structure was determined by performing principal component analysis (PCA) of the genetic variants with Tassel v5 software 48 .
Fire blight inoculation and data collection. The fire blight inoculation of grafted plants was performed with highly virulent strains of E. amylovora (E2002a, Ea273, and E4001a) 49 to evaluate varying levels of resistance and susceptibility. In 2018, the population was screened with a 1:1:1 ratio of E2002a, Ea273, and E4001a at a concentration of 1 × 10 9 CFU/ml. In 2019, the only Ea273 strain was used at the same concentration as the prior year. The preparation of the inoculum consisted of plating the E. amylovora strain on King's Medium B Base culture media, incubated at 27 °C for 2 days. Visible bacterial colonies were then scraped and added to 1× Phosphate Buffered Saline (PBS, pH 7.4) solution to halt growth of the bacteria. The bacterial suspension was quantified with a Smart Spec Plus Spectrophotometer (BioRad Hercules, CA) based on the optical density at 600 nm and diluted accordingly with 1× PBS to a concentration of 1 × 10 9 CFU/ml. The inoculation was done by cutting the youngest leaf on the newest grown shoot with sterilized scissors dipped in the inoculum. During the early infection period, the temperature and humidity of the greenhouse was raised to 25-27 °C and 75-80%, respectively. After 12 days in 2018 and 8 days in 2019, the data of the fire blight infection was collected by measuring the leaf length (LL; cm) from the scissor cut to the end of leaf blade, leaf necrosis length (LN; cm) from the scissor cut to the visible lesion on the leaf, shoot length (SL; cm) of the current year's growth from the node of the inoculated leaf and shoot necrosis length (SN; cm), the visible necrosis in the current year's growth from the node of the inoculated leaf.
Statistical analysis of fire blight data. Analysis of fire blight data was performed in R Version 3.6.2 50 .
The datasets from each year were analyzed to remove any outliers. In 2019, plants with only old lignified leaves available for infection were noted and filtered out from further analysis. The remaining data were filtered for the genotypes with data for at least three replications. The leaf length (LL), leaf necrosis length (LN), shoot length (SL), and shoot necrosis length (SN), evaluated as above, were used to calculate percent leaf lesion length (PLLL), percent shoot lesion length (PSLL), and percent total distance (PTD) as follows; PSLL is described as percent shoot necrosis (PSN) or percent lesion length (PLL) in previous literature 18 www.nature.com/scientificreports/ separately. For the traits PLLL, PSLL, and PTD, a two-way analysis of variance (ANOVA) was performed to test for significant differences in phenotype across genotypes. To assess the homogeneity of the variances across the 2 years of data and disease traits within accessions, a non-parametric Fligner-Killeen test was conducted on the PLLL, PSLL, and PTD traits. In addition to the genotypic effects, the ANOVA model also accounted for the length of the inoculated leaf as covariates as well as the interaction effect between the individual genotype and the leaf length. The disease phenotype for each year was clustered to coerce accessions into classes of resistances using the partition k-means clustering approach. To perform clustering, the infection data for the traits PLLL and PTD of each genotype was scaled to a distribution mean of zero and a standard deviation of one via a Z-score normalization. Euclidean cluster distances were calculated using the 'k-means' package in R Version 3.6.2 based on the algorithm of Hartigan and Wong (1979). The number of clusters was determined using the 'NbClust' package in R Version 3.6.2 based on 30 unique indices that determine the optimal number of clusters representing a dataset 50,52 .
Association between fire blight susceptibility and genomic variations. The fire blight phenotype data and the corresponding SNP genotypic data was used to conduct iterative marker-trait associations. The SNP genotype file was phased with Beagle v5.1 53 and converted from the IUPAC ambiguity codes to a dominant genetic model, where '1' represents the homozygous SNP states and '0' represents the heterozygous sites. Each vector of genotype states for each marker was matched against the phenotype data for PLLL, PSLL, and PTD to perform iterative independent Kruskal-Wallis H-Tests with a custom loop created in R version 3.6.1 50 . The output provided p values for each SNP marker across all three traits and each year of data collection to identify significantly associated SNPs. Highly reliable marker-trait associations were determined through the criteria of high statistical significance with p < 0.001, their detection in multiple years and significance in leaf and shoot susceptibility. The fire blight susceptibility values of SNPs satisfying these criteria were plotted to observe the effect the genotype state has on specific traits. To find patterns across wild and domesticated Malus species, the proportion and frequency of species that possessed each selected SNP state was analyzed. The LD relationship of each pairwise marker combination was calculated in order to assess the relation of selected SNPs to the neighboring markers. This was performed first with the PLINK v1.90b6 genomic software to generate .ped and .info file format 54 . The files were imported in the Haploview v4.2 software to visualize marker relationships and find haploblocks 55 .

Results
Genetic diversity in homologs of HIPM and DIPM. Nine homologs of HIPM and DIPM genes were identified in the GDDH13 apple genome sequence 43 with sequence length ranging from 1.4 to 3.7 kilobases (Table 1). These genes were present across 8 chromosomes in the apple genome. Two duplicated gene copies were present for DIPM2, DIPM3, DIPM4, and HIPM1, whereas DIPM1 had a single copy present in the GDDH13 genome. It appeared that two copies of DIPM2 located on Chr13 likely evolved through tandem duplication events due to their close proximity. The duplicated copies from remaining HIPM and DIPM genes were distributed across different chromosomes that might have occurred through whole-genome duplication events.
To assess the genetic diversity across DIPM and HIPM genes, we used publicly available whole-genome sequences from 145 diverse Malus accessions (collection 1). A total of 677 SNP variants were identified across these genes. The least variable gene was DIPM4a with 18 variants while the highest, 189 variants, were found in HIPM1a (Table 1). Nucleotide diversity (π) varied from 6.02 × 10 -03 to 2.66 × 10 -03 for DIPM4a and DIPM2a, respectively, with an average π of 1.38 × 10 -02 . The gene annotation analysis identified 65 total variants as nonsynonymous mutations that can lead to amino acid changes in a protein. There were also 3 frameshift and 2 non-frameshift Indels present in HIPM1a, DIPM3a, and DIPM4b genes. Furthermore, inference of population genetic structure using principal component analysis (PCA) with 677 SNP variants did not reveal any distinct clusters across the 145 Malus accessions (Supplementary Figure S5A,B). Altogether, these results demonstrated the high level of genomic diversity present in the DIPM and HIPM genes in apples. www.nature.com/scientificreports/ The lack of phenotypic information for accessions in this dataset hindered the possibility of making an association analysis between SNP markers and fire blight susceptibility. Thus, amplicon sequencing was employed to amplify and assess the genetic diversity in the HIPM and DIPM genes in 93 selected apple accessions (collection 2; Supplementary Table S1) 44 . The latter analysis identified a total of 87 SNP variants across 9 DIPM and HIPM genes ranging from 4 (HIPM1b) to 16 (DIPM2a) ( Table 1). The average π was 7.96 × 10 -03 in 9 genes that varied from 4.5 × 10 03 (DIPM4a) to 1.8 × 10 02 (HIPM1a). Moreover, annotation of gene sequences revealed that 19 SNP mutations lead to changes in amino acid sequences of proteins in the DIPM gene homologs. At least one amino acid-changing mutation was identified in each of the DIPM genes. For instance, a single nucleotide change from "G" to "A" (position = 379) in DIPM4a can change the amino acid glycine to serine (Supplementary Table S2). Similarly, three different SNP mutations in the DIPM4b gene can change amino acid sequence from valine to aspartic acid, phenylalanine to leucine, and lysine to arginine. We further used the DIPM-HIPM SNP dataset to infer population genetic structure across 93 diverse Malus accessions from Collection 21 using PCA. Two accessions were removed from the analysis because they had more than 50% missing data. PCA identified three groups in the 93 Malus accessions. A set of 7 accessions including GMAL 2590, GMAL 2366, GMAL 1527.f1, GMAL 3052.g1, 'Prince George' , and 'Glabrata' showed close grouping along PC1 values less than − 3.0 (Supplementary Figure S5B). Another set of 9 GMAL and "Kola'' accessions form a dispersed group with PC1 values between − 1.0 and − 2.0 (Supplementary Figure S5B). Finally, the third and largest group consisted of 77 accessions with highly diverse genetic backgrounds (Supplementary Figure S5B; Supplementary Table S1).
Comparative variant analysis between collection 1 and collection 2 indicated higher genomic variation and amino acid-changing mutations across HIPM/DIPM genes from 145 accessions, probably due to near complete coverage of these gene sequences in the GDDH13 apple genome. However, thirteen annotated variants were shared between 145 NCBI and 93 AmpSeq samples, out of which only 2 on the gene "MD08G1095100" were nonsynonymous (Table 1) Figure S2). The distributions of the leaf lengths and shoot lengths were normal across all accessions. The percent leaf lesion length (PLLL) was bimodal, centered around the extremes of the severity scale, the percent shoot lesion length (PSLL) was right skewed, and total lesion length measured as percent total distance (PTD) was also right skewed (Supplementary Figure S1 Figure S2A,C).
The variation in PLLL, PSLL, and PTD during 2018 and 2019 was mainly determined by the genotypes ( Table 2). For example, the ANOVA results showed that genotypic effects significantly influence fire blight susceptibility levels consistently for both leaf lesion (PLLL), shoot lesion (PSLL) and total lesion length in both years ( Table 2). The k-mean clustering analysis of phenotypes showed larger ellipses for the susceptible class compared to the resistant class (Supplementary Figure S3). The clustering of genotypes based on the PLLL and PTD values identified three fire blight infection severity groups: susceptible, moderate, and resistant. The increase in fire blight severity of the susceptible group compared to the resistant group was 66.7% for PLLL and 27.2% for PTD in 2018. In 2019, the increase in fire blight severity of the susceptible group compared to the resistant group was 79.3% for PLLL and 36.3% for PTD.

Association between fire blight susceptibility and SNPs in HIPM and DIPM. The Kruskal-Wal-
lis H-test performed in R Version 3.6.2 50 identified 30 unique SNP markers that showed significant association to fire blight susceptibility at p < 0.05, 2 of which showed high significance at p < 0.001 ( Fig. 1; Supplementary   Table 2. A summary of fire blight data collected in a controlled greenhouse during 2018 and 2019 for 93 Malus accessions. The summarized data shows mean, standard deviation (SD), and analysis of variance (ANOVA) p value for fire blight susceptibility evaluations. Whereas p < 0.001: ***.    Table S3). This marker was present in the DIPM2b gene on chromosome 13. Another SNP marker S14_14033636, on the DIPM4b gene region on chromosome 14, showed significant association (p < 0.001) with all three traits in 2019 (Fig. 1). According to haplotype analysis, the SNPs significantly associated with fire blight susceptibility were in low linkage disequilibrium (LD) with flanking markers (Supplementary Figure S4A,B). Boxplots of the homozygous and heterozygous genotypes at these SNP states showed clear phenotypic differences in their susceptibility levels (Fig. 2). The heterozygous state for S13_2611083 SNP marker had higher rates of fire blight severity and conferred 1.8 and 2.6-fold increases in susceptibility in 2018 for PLLL, and PTD, respectively ( Fig. 2A). The homozygous allele state of S14_14033636 SNP marker was associated with higher fire blight susceptibility and showed 5.4, 2.5, and 3.7-fold increases in susceptibility in 2019 for PSLL, PLLL, and PTD, respectively (Fig. 2B). In 2019 for the same marker, there were 2.5, 2.2, and 2.6 fold increases in susceptibility for PSLL, PLLL, and PTD from changing the homozygous SNP to the heterozygous.
We further observed additive allelic effects on the fire blight susceptibility from the most significant SNPs across HIPM and DIPM genes. For example, accessions possessing alleles linked with reduced susceptibility from both S13_2611083 and S14_14033636 markers showed an additive phenotypic response of significantly less fire blight susceptibility for PLLL (p value = 2018: 0.011, 2019: 0.0084) and PTD (p value = 2018: 0.019, 2019: 0.0011) and vice-versa (Fig. 3). The differences for PSLL were only significant in 2019 between the most susceptible class ' A + K' and the two less susceptible classes 'R + K' , and 'R + G' (p value = 0.007,0.02; respectively) (Fig. 3).
The distribution of marker alleles across Malus species showed that the allelic states linked with higher susceptibility levels were mostly present in M. domestica accessions (Fig. 4A,B). In 2018, the average increase of susceptibility in domesticated accessions as opposed to wild accessions were 2.13, 6.12, and 5.22% for PLLL, PSLL, and PTD, respectively. In 2019, the average increases of susceptibility in domesticated accessions were 13.67, 6.68, and 5.5% for PLLL, PSLL, and PTD, respectively. SNP alleles with reduced levels of susceptibility mainly occurred in wild Malus species, whereas the proportion of higher susceptibility SNPs were found in M. domestica (Fig. 4A,B).

Discussion
Genomic and genetic characterization of the S genes, which are responsible for successful pathogen infection, is a logical approach to achieve sustainable disease tolerance in improved cultivars 56,57 . Natural genetic variation in these genes can be tapped to develop cultivars with decreased fire blight susceptibility, a major goal of apple germplasm improvement efforts worldwide. The susceptibility genes, HIPMs and DIPMs, have been identified to regulate susceptibility and improved tolerance against fire blight in commercial apple cultivars 35,38,40,58 . We have found a great number of genomic variations in the 9 HIPM and DIPM genes across 145 diverse Malus accessions. We also found 30 SNPs with significant (p < 0.05) associations to fire blight susceptibility, two of which had strong associations in two consecutive years of fire blight infection. A majority of the marker allelic states related to higher levels of susceptibility in these trait-associated markers were found in M. domestica, which is consistent with the current understanding that domesticated cultivars are generally susceptible to fire blight 7 . A higher proportion of species with the less susceptible allele were seen in wild species, with a shift to 50:50 in Malus hybrids. This is a possible indication of an inheritance pattern where a new mutation is introduced in a population and the more susceptible alleles are maintained and proliferated by selection during apple domestication. These susceptibility alleles were likely maintained in the domesticated germplasm due to fewer instances of cross-hybridization with wild apple species, to avoid unnecessary linkage drag of undesirable traits affecting fruit quality in apple cultivars. The SNP marker S13_2611083 on DIPM2b showed significant association (p < 0.05) with PLLL and PTD in 2018 and (p < 0.001) in 2019. Robustness and stability of this SNP marker across two years makes it a reliable marker to screen for fire blight susceptibility and for use in marker-assisted breeding. Similarly, the strong association (p < 0.001) of SNP marker S14_14033636 on the DIPM4b gene with PLLL and PTD indicates its potential use in marker-assisted selection to decrease fire blight susceptibility. The latter finding is consistent with previous work in which knocking out the MdDIPM4 gene via CRISPR/Cas9 led to a 50% reduction in fire blight symptoms on notoriously susceptible 'Gala' and 'Golden Delicious' cultivars 40 . These results also highlight utilization of candidate-gene based association studies to rapidly identify molecular markers linked to traits of interest for progeny and parent selection in breeding programs 59 . Breeding from a perspective of susceptibility rather than resistance can help resolve issues in tree fruit crops where linkage drag and cross incompatibility between Malus species and cultivated apples makes introgression of resistance alleles not feasible or very time consuming in many cases 11 . Hence, S-gene breeding opens the opportunity for breeders to more effectively improve fire blight resiliency of high value apple cultivars while not compromising on critical consumer focused traits.
Our results suggest that additive effects from multiple small effect loci in HIPM and DIPM genes contribute towards overall variation observed in disease susceptibility of domesticated apples. For example, combinations of positive alleles of S13_2611083 and S14_14033636 SNPs from two different loci have a stronger effect on fire blight susceptibility measures (i.e., PLLL and PTD). This is in contrast with the effect of major resistances contributed by wild apple germplasm, where a single gene or locus determined the fire blight resistance response without interacting with other loci in the genome 16,[60][61][62] . The breakdown of single gene resistance/s by novel fire blight strains is highly likely, as shown for the breakdown of Robusta 5 (MR5) fire blight resistance by a deletion in the avrRpt2 effector of Ea1189 E. amylovora strain 20 . Similarly, different strains of E. amylovora have been Figure 2. Boxplots of fire blight susceptibility at homozygous and heterozygous states of significantly associated SNP markers (A) S13_2611083 and (B) S14_14033636, respectively. The y-axis represents phenotypic values for percent leaf lesion length (PLLL), percent shoot lesion length (PSLL), and percent total distance (PTD). The "G" and "A" represent homozygous, and "K" and "R" represent heterozygous genotypic states for S13_2611083 and S14_14033636, respectively, on the x-axis. The  21 . These studies indicate that introgression of single gene resistances into commercial germplasm lie at a risk of breakdown by rapidly evolving pathogen strains. Emergence of a virulent strain for the particular resistance gene poses significant threat to breeding and selection efforts based on few major genes. Alternatively, additive alleles from different pathways can help achieve durable reduced susceptibility in a cultivar. For instance, selection of appropriate allelic combinations from HIPM/DIPM genes can lower successful pathogen infection, leading to reduced fire blight susceptibility. The root system can also influence fire blight susceptibility levels through co-regulated gene expression patterns and system-level interactions between carbohydrate and defense pathways 63 . Therefore, allelic combinations for the optimal root system can be used to breed rootstocks to further reduce fire blight susceptibility of grafted scions. These results depict a few examples related to disease susceptibility regulation, but apparently there are still several unexplored routes associated with disease susceptibility in the domesticated apples. Nonetheless, the presence of additive gene action from multiple pathways that are associated with disease susceptibility can make it harder for new pathogens to completely overcome the reduced disease susceptibility, which could be feasible to improve disease tolerance in apple breeding programs 56 .
The results presented in this study also represent strain-specific differences in host plant responses to fire blight as observed in the identification of different significant SNPs between 2018 and 2019. A mixture of three strains (Ea273, E4001a, E2002a) was used in 2018 while only E2002a was used in 2019 to inoculate the plants.
The pure inoculum of the highly virulent E2002a strain generally showed higher disease incidence in 2019 for PLLL and PTD ( Figure S2). In the mixed inoculum of 2018 infections, E2002a was present in lower concentrations and probably had more competition for host resources with Ea273 and E4001a. Furthermore, E2002a is a native North American strain, first discovered in Ontario, Canada off the "Jonathan" cultivar 64,65 . This strain likely had higher co-evolutionary compatibility with the North American domesticated accessions in this study. The chances of host-pathogen compatibility are mainly determined by the profile of effector proteins in individual E. amylovora strains that are responsible for fire blight pathogenicity and triggering of the hypersensitive response in Figure 3. Pairwise Kruskal-Wallis H-tests to measure the additive effects of S13_2611083 and S14_14033636 significant SNP (single nucleotide polymorphism) markers in years (A) 2018, and (B) 2019. The x-axis represents one of the four possible combinations of homozygous and heterozygous genotype states from markers S13_2611083 (G or K) and S14_14033636 (A or R). The y-axis represents the fire blight susceptibility response in accessions carrying specific allele combinations. Fire blight susceptibility was measured as PLLL (percent leaf lesion length), PSLL (percent shoot lesion length), and PTD (percent total distance). The numbers on top of brackets for each subplot represent the p value of significance for pairwise comparison of each allelic combination for a specific trait and year.

Scientific RepoRtS
| (2020) 10:16317 | https://doi.org/10.1038/s41598-020-73284-w www.nature.com/scientificreports/ www.nature.com/scientificreports/ Malus 6,66 . However, the strains used in this study exhibit high genome similarity 49 and their considerably different phenotypic and genetic responses in host plants can lay out further investigations of Malus-Erwinia interactions. In summary, a great genomic diversity was found in HIPM and DIPM genes in a wide set of Malus accessions and a candidate gene-based approach identified markers that were strongly associated with fire blight susceptibility. The validation of these markers could provide opportunities to use them in MAS to breed apple cultivars with reduced fire blight susceptibility. Moreover, characterization of S-genes can provide an alternative to improve fire blight resistance, either by breeding or genome-editing to avoid unnecessary risks associated with compromised fruit quality and breakdown of major resistance genes from wild Malus species.