A new inclusive MLVA assay to investigate genetic variability of Xylella fastidiosa with a specific focus on the Apulian outbreak in Italy

The Olive Quick Decline Syndrome by Xylella fastidiosa subspecies pauca is among the most severe phytopathological emergencies nowadays. In few years, the outbreak devastated olive groves in Apulia (Italy), potentially endangering the entire Mediterranean basin. This research aimed to develop a multiple locus VNTR analysis assay, a molecular tool to differentiate between populations of the pathogen. It has already been successfully applied to different X. fastidiosa subspecies from various plant hosts. The previously published TR loci, together with a set of new design, have been tested in silico on the genome of the Apulian De Donno strain. The resulting selection of 37 TR loci was amplified on the genomic DNAs of the Apulian strains AND from representatives of X. fastidiosa subspecies, and directly on DNA extracted from infected plants. The assay clearly discerned among subspecies or even sequence types (ST), but also pointed out variants within the same ST so as to provide more detailed information on the dynamics and pathogen diffusion pathways. Its effective application even on total DNAs extracted from infected tissues of different host plants makes it particularly useful for large-scale screening of infection and for the strengthening of containment measures.

First of all, numerous markers resulted to be the same, even if reported in the papers with different names and amplified with different primers. In some cases, the tandem repeat is also reported as reverse and opposite sequence. More specifically, SSR20 marker 32 is the same as COSS1 marker 38 , SSR28 marker 32 is the same as marker ASSR-14 33 , SSR30 marker 32 is the same as marker OSSR-19 33 , SSR32 marker 32 is the same as COSSR6 marker 38 , CSSR-17 marker 33 is the same as COSSR3 marker 38 , OSSR-9 marker 33 is the same as marker ASSR-20 33 , OSSR-14 marker 33 is the same as CSSR45 marker 38 , OSSR-16 marker 33 is the same as CSSR-20 marker 33 , OSSR-17 marker 33 is the same as CSSR-7 marker 33 , CSSR-18 marker 33 is the same as GSSR-6 marker 33 , and GSSR-12 marker 33 is the same as CSSR42 marker 38 . These correspondences between markers from different studies and their respective positions along the De Donno genome are summarized in Table S1. In these cases only one pair of primers was chosen for the amplification, i.e. the ones whose sequences best fitted X. fastidiosa subsp. pauca strain De Donno (loci highlighted in yellow in Table S2). In addition to these duplications, several other discrepancies have been found in comparison to the De Donno genome, which are marked as nucleotides in red in Table S2. Specifically for markers from the study of Della Coletta-Filho et al. 32 , it was not possible to detect the reverse primer for SSR26 marker, SSR32 marker contains 2 different tandem repeats, both of 8 bp (the one reported is in green) and one SNP in the forward primer, for SSR36 and SSR40 markers, the TR was not retrievable even if their respective primers were detected with few differences, for both; finally, none of the two primers for the SSR34 marker amplification was found. Regarding the markers described by Lin et al. 33 , it was not possible to find the forward primer of OSSR-12 marker; OSSR-19 marker has an SNP in the reverse primer; only a single TR was found for CSSR-4, CSSR-6, GSSR-14, GSSR-15, GSSR-19, GSSR-20 markers; TR is absent for CSSR-12 and CSSR-13 markers; the primers of CSSR-16 marker show multiple annealing sequences; a different TR sequence and one SNP were found in the reverse primer for ASSR-16 marker, one SNP was found in the forward and one in the reverse primer of both ASSR-19 and GSSR-4 markers. No discrepancies were found in the 7 VNTR loci described in the paper of Francisco et al. 38 . After this check, several markers were discarded due to duplication, failure to detect the primer sequence or repeat sequence. Also, all the differences detected between the primer sequences and the corresponding pairing sequences on the De Donno genome were accordingly corrected (nucleotides in green in Table S2) for the synthesis of the primers for this study that are reported in Table 1.
New VNTR loci identification. The searching procedure by Tandem Repeat Finder (TRF; https ://tande m.bu.edu/trf/trf.html) 43 has led to the identification of 25 VNTR loci, which have also undergone an in silico check to ascertain their position on the De Donno genome and to verify any correspondence with loci selected from literature. Again, this correspondence was found for 5 of them, which were consequently discarded. At the end, 45 total VNTR loci were selected (25 from the literature and 20 newly designed) to be used in the following experimental procedures.

PCR amplification of VNTR loci.
A first round of PCR was done using only genomic DNAs from two strains, the De Donno strain from olive and the V104 strain isolated from oleander, to validate the efficacy of the entire MLVA assay. The primer pairs related to newly designed VNTR loci (TR3, TR11, TR14, TR16, TR17 and TR25) produced multiple amplicons, indicating the presence of multiple pairing sites for at least one of the two primers and were consequently discarded. Also, SSR40 locus 32 invariably produced a 133 bp amplicon, corresponding exactly to the sum of the only flanking regions, thus indicating the absence of the tandem repeat; whereas OSSR-2 locus 33 invariably produced a 181 bp amplicon, smaller than the sum of the flanking regions; both were discarded. Table 1 shows the list of 37 loci and the related final primers used in the amplification step. According to the amplicon sizes obtained by capillary electrophoresis, the number of repeats per locus and per strain was calculated. Amplification failure was eventually coded as "0". Thus, the haplotype of each individual was defined as the ordered sequence of 37 numbers, as reported in Table S3. It has to be noticed that the amplification of two DNAs obtained from infected tissues of Prunus dulcis and Polygala myrtifolia in the province of Lecce (Pd_Le2 and Pm_Le10) provided multiple bands in some loci. Due to these discrepancies, which cannot be properly coded as input data, both samples were excluded from the following analysis.
Data analysis. The haplotypes obtained by the amplification of 37 VNTR loci on a total of 51 DNAs are reported in Table S3, which includes the genomic DNAs extracted from 15 strains of the CFBP collection, 9 strains isolated in Apulia and 27 total DNAs as extracted from infected plant samples; the last 2 lines of the Table S3 reveal that multiple repeats have been detected in some loci of the samples Pd_Le2 and Pm_Le10, which were not considered in the subsequent analyses. The first run of data processing was carried out limited to the 15 strains from CFBP collection by hierarchical clustering using Bruvo's distance ( Fig. 1) to check the efficacy of the method to assess genetic differences between subspecies and sequence types (STs). Four main clusters were formed, each corresponding to one subspecies of X. fastidiosa. However, the strains of the same subspecies yet belonging to different STs are separated by genetic distances greater than those measured between strains belonging to the same sequence type, i.e. the two strains of subspecies pauca belonging to ST74 and two, out of three, belonging to ST53. Following the focus of the paper, the data obtained from VNTR loci amplification of DNAs extracted from strains isolated in Apulia and from DNAs extracted from infected tissues of different host plants in the same region were added for the second run of analysis. We were expecting that the very high number of loci analyzed here would highlight even minimal differences between samples. www.nature.com/scientificreports/ geographic location were found completely identical to each other along the 37 loci. The hierarchical clustering of all the samples (Fig. 2) maintains the same structure as the previous dendrogram, with the addition of all the Apulian specimens to the subspecies pauca cluster. Within this group, the CFBP8429 strain, isolated from Coffea arabica plants in Angers (France) in 2015 and reported in the CFBP database as belonging to the ST53, results significantly separated from all the others ST53 from Italy. Lastly, the 38 DNAs of Apulian origin were analysed independently to better appreciate their relationships, given that their reciprocal genetic variability turned out to be extremely low. Indeed, an identical number of repetitions was obtained in as many as 18 loci out of 37 (48%), and, among the remaining 19 loci, 11 changed sporadically in few samples, whilst only 7 loci (TR7, TR8, TR12, OSSR-16, OSSR19, ASSR-16, and COSSR-4) showed frequent variations in the numbers of repeats. Table S4 reports the position of the latter loci, amplified by the primers used in this study, along with the genome of De Donno strain of X. fastidiosa subsp. pauca, and the incidental inclusion of coding sequences. In Fig. 3, the Minimum Spanning Trees (MST) illustrate how the haplotypes of these samples are linked to each other at best. The MSTs reveal that no specific correlation of the haplotypes with the species of host plants from which they were obtained ( Fig. 3a) can be inferred, nor with the specific geographic origin (Fig. 3b).  GGA TAC AT  GCT ACA CGT GCA ACAAC  ATT GCT G   2  SSR21  AAC ACG GAT CAA GCT CAT G  GGA ACA CGC AAT AGT AAG A  TGT TAT C   3  SSR28  GTA ACG CTG TTA TCT CAA T  ATT ACG CTT CTT ATC GCT GT  GTG TGC CT   4  OSSR-9  TAG GAA TCG TGT TCA AAC TG  TTA CTA TCG GCA GCA GAC  TTT CCG T   5  OSSR-16 GCA AAT AGC ATG TAC GAC  GTG TTG TGT ATG TGT TGG  CTG CTA   6  OSSR-19 GCT GTG AAC TTC CAT CAA TCC  GCA AGT AGG GGT AAA TAT GAC  CAG GAT TGG AGG TA  TTC TAG   26  TR6  ACA TCG GAG GTA GGC TGT GA  ATT GAA GAC CCT TTT CAG CC  CGC TTA T   27  TR7  GGG TTG GGT CTT TTA TTT GC  CAT TGA CTC TCA ACC CTG CTAC GCTGT   28  TR8  GCG GTT TGG TTG TAT TGC TT  CTC ACA TCA CGC ACC GAC GA  GAC AGG   29  TR9  GGT GTG CCG TGT ACA TTG AG  TTG CCA TCA CCG ACA CCT CT  ATG ATC TGA   30  TR10  CGT GCT GAA GTC TTG CTT GA  ACT TCA CCC TAC CCT GCA TA  GTA ACG   31  TR12  AGG GAT ATA GTG CCG CGA TT  TTT TGT GGT CGA ACG TGC GG  GGT GTG A   32  TR15  ATG CAG CGG TAG TCC CTC TA  CAC GAT GCC CAC GTA GCA GC  GTG TCG   33  TR18  TGT CAT GAC CGT GCT TAT GG  TGG TGG TCA AGG CAG CGG  CCG CCG CCG TAA CCA CCG   34  TR19  CTG CCT TGA CCA CCA CCA C  ACA AAG CTC TCT GAT CAA TCAC CCA CTC CAG CTG   35  TR21  CAG GGT GTA TGG CCT GAA GT  CCT ACC ATC CAT GCA GCA AC  CAG CAC AT   36  TR23  CAG GAG CCT CCA TGA ACA AT  AAT GAT CCT TGC TGG GTC AG  CTT CAA GAG   37  TR24  ATG GCC CAA ACA TAC TCC AA  TGT TCA TAT CTT GGT CTC

Discussion
Xylella fastidiosa is one of the most feared bacterial plant diseases nowadays. Several biological features make its management very challenging, first and foremost for its ability to colonize an astonishing number of plant hosts, often symptomless, as well as for the difficulty in its isolation and detection. A crucial issue to better understand the spread dynamics is the genotyping of the pathogen, which can be carried out by different molecular techniques. MLST is the method of choice for Xylella fastidiosa genotyping at subspecies level and beyond 44 , based on the detection of SNPs included in seven constitutive genes 45 . By this method, it is possible to further categorize the six subspecies of Xylella fastidiosa in 87 different sequence types (STs) 8,22,23,26,46 . The protocol entails the specific PCR amplification of the genes using DNA extracted from both cultured strains and infected plant tissues, followed by Sanger sequencing, preferably in both strand directions 44 , that take time to get the final results. This might be a limit if large numbers of samples are to be analysed, as in epidemic monitoring 4,10,26,27 . According to this method and to our knowledge, all the strains related to the Apulian outbreak in Italy analysed so far are univocally assigned to the genotype ST53 of X. fastidiosa subsp. pauca 11,25,47 . MLVA has proven suitable in genotyping Xylella as well, with a specific focus on assessing genetic variability among strains involved in local outbreaks. This distinctive feature is testified in numerous studies on different host plants such as grapevine 48 , orange 35 , almond tree 36 and coffee 38 . Only recently, a first study using a panel of 12 SSRs on X. fastidiosa subsp. pauca strains from olive trees in Brazil was published 46 . This study explored the potential of this method in www.nature.com/scientificreports/ detecting subtle genetic differences among a selection of samples from the Xylella outbreak in Apulia, relying on its high sensitivity coupled with a mere amplicon dimension measurement. To our knowledge, a new inclusive MLVA assay was developed for the first time, and applied to this scenario. It's worth emphasizing the importance of a preliminary in silico analysis of markers when obtained from multiple literature sources. Indeed, the screening carried out here revealed that several loci from previous independent studies were the same. This leads to the first noteworthy consideration that, even if lots of repeats can be found along a whole bacterial genome, the effective loci with features suitable for proper genotyping are much less common and numerous. Moreover, a clearer classification of markers to avoid misunderstanding and confusion in their use appears is highly needed.
As an example, in Safady et al. 46 twelve markers from literature have been used 33,38 ; however, our in silico analysis demonstrated that two of them, i.e. CSSR42 and GSSR12 loci, refer to the same repetition, giving the same result. This kind of misunderstanding can negatively affect results and possibly lead to incorrect conclusions. In silico screening and the first round of PCR testing have almost halved a large number of potential markers (50 from literature and 25 newly designed) to a selection of 37 affordable TR loci. However, this is still a conspicuous set -thoroughly tested -to detect genetic differences among X. fastidiosa DNAs from the Apulian outbreak. Two analytical approaches were independently applied to haplotype data according to their peculiarities. The Bruvo's genetic distance 49 is particularly suitable for genetic markers as tandem repeats because it accounts for repeat length into calculation and is not sensitive to ploidy levels [50][51][52] . For these reasons it was used to build the dendrograms of Figs. 1 and 2. The approach proposed by Francisco et al. 53 , in which the efficacy of eBURST algorithm 54 was reinforced with additional rules to better elucidate possible patterns of evolution, was used to analyse the Apulian dataset in detail. Although MLVA is not primarily aimed to ascertain phylogeny, due to its inherent sensitivity in appreciating minimal differences between individuals, this assay has proven proficient to correctly cluster subspecies of Xylella fastidiosa under the recognized phylogeny. However, the use of few representatives doesn't allow to draw consistent conclusions. When compared with MLST categorization, this assay seems to have a higher discriminatory power, as highlighted by the distinction within STs in the results. Therefore, it is of particular importance that CFBP 8429 strain, isolated from a coffee plant intercepted in Angers (France) in 2015, and belonging to the ST53 (as reported in CFBP database), showed substantial differences, in MLVA analysis, in the number of repetitions compared to the other samples of the same ST53. This led to its independent positioning in hierarchical clustering, with 100% bootstrap support, and in MST. However, we cannot exclude that this discrepancy could be related to misidentification of its ST, so that a larger comparison with isolates of the same subspecies from other geographic regions is recommendable to validate this hypothesis. In this study, the MLVA assay was steadily successful in ascertaining differences within single ST, differentiating between almost all the samples. This could make its utilization valuable in scenarios like the Apulian outbreak, where a highly clonal population of X. fastidiosa subsp. pauca, belonging only to ST53 and whose origin is probably attributable to a single infection, is under investigation. In our data, such clonality is widely confirmed and information capable of differentiating between individuals are relegated to a few loci. As to these most variable loci, looking at their correspondence in the genome sequence of the De Donno strain, it is noticeable that in five out of seven the tandem repeats are included within classified or hypothetical proteins, highlighting their role for the adaptation ability of the pathogen. From a practical point of view, those VNTR loci could be selected to arrange a further MLVA assay capable to resolve close genetic relationships in this scenario. Once their robustness is confirmed, they could be multiplexed to obtain a final assay for large-scale monitoring. However, the small differences revealed within ST53 strains in this analysis don't show evidence for specific relationships with the species of the host plant or with the geographic origin of the strains. This has not to be necessarily meant as a failure of the method. Most probably, these differences account for the first signal of random variability within the Xylella population in Apulia; they do not reflect yet the effect of any evolutionary pressure toward hostspecific variants, nor to the effect of spatial separation toward local independent populations. This is consistent with the multi-host nature of the pathogen and the relatively short time lapse since the first outbreak. Since Xylella fastidiosa is very "fastidious" and time-consuming for isolation and culturing, MLVA assay developed in this study has, similarly to other molecular techniques, the significant advantage to eventually screen the DNA extracted straight from the infected plant material without losing reliability. In this respect, we also postulate that this method could diagnose infections by multiple genotypes in the same plant tissue, such as for the two samples Pd_Le2 and Pm_Le10, in which some loci showed simultaneously multiple amplicons, corresponding to different numbers of repetitions. In conclusion, all the results seem to indicate that this novel MLVA assay has the potential to become a valuable method for thorough monitoring of the Italian outbreak of Xylella fastidiosa subsp. pauca, as well as for any other Xylella fastidiosa epidemics.

Methods
All the 24 strains analysed in this study are reported in Table 2, where subspecies, host plant, geographic origin, time of isolation, and ST classification are also indicated. Fifteen strains (marked with § ) were sourced from the CIRM-CFBP (Collection Française de Bactéries associées aux Plantes). These strains were grown on Buffered Charcoal-Yeast Extract (BCYE) medium at 25 °C for 3-4 weeks; 100 mg of bacterial cells were then collected, and DNA was extracted with the Nucleospin Plant kit (Macherey Nagel) according to the manufacturer's instructions. Similarly, 9 strains (marked with *), were isolated from different host plants in Apulia and their genomic DNAs were extracted from freshly grown strains at CIHEAM-Mediterranean Agronomic Institute of Bari (CIHEAM-IAMB). The remaining 27 samples (marked with • ) are instead constituted by total DNAs extracted straight from tissues of plants whose infection by X. fastidiosa was previously assessed. These include DNAs from different host plants in various locations in Apulia, i.e. in the provinces of Lecce, Taranto, and Brindisi (Fig. S1 55,56 ). All DNAs were checked by q-PCR to confirm their belonging to X. fastidiosa according to the protocol described in Harper et al. 57 .
Scientific RepoRtS | (2020) 10:10856 | https://doi.org/10.1038/s41598-020-68072-5 www.nature.com/scientificreports/ to estimate precisely the size of the amplicons. The DNA High Resolution cartridge was used for all samples and the OM800 method was run, as recommended, to achieve maximum accuracy (2-3 bp maximum error) with amplicons ranging in size from 200 to 500 bp. No template controls (SDW) and size markers were included in each run. The results were analysed and interpreted using the ScreenGel v.1.6.0 software (QIAGEN), which gives accurate estimation of both size and concentration of amplicons. Then, the number of tandem repeats at each VNTR locus was calculated by subtracting the flanking regions size from the amplicon size and dividing the remaining by the repeat unit length. In case of loci with a truncated final repeat, the copy number was rounded down to the previous integer. To validate the robustness of the results, the entire PCR and capillary electrophoresis procedure was performed at least twice per sample. To check the accuracy of tandem repeats calculations, a random selection of amplicons was Sanger sequenced, as well as for each case of disputable TR number attribution. A final data matrix (Table S3) of 51 samples and 37 numbers of TR for each locus was produced.
Data processing. The string of integers obtained as above constitutes the haplotype of each strain under investigation. These were reciprocally compared using independently two analytical approaches suitable for this type of data, the hierarchical clustering and the goeBURST algorithm. Data were analysed in two steps: first, only the 15 strains from the CFBP collection were included to check the effectiveness of the method in maintaining the correct subspecies structure and in detecting differences among Sequence Types (STs); then, the remaining 34 DNAs, all related to the Apulian outbreak, were added to assess their relative positioning and grouping.
Hierarchical clustering. The data matrix was imported into R version 3.4.4 (R Core Team, 2018). The genetic distance among individuals was calculated using Bruvo's distance 49 , and bootstrapped using the poppr bruvo. boot() function with a cut-off threshold of 80%. The hierarchical clustering was then obtained by hclust() function of the R package stats (R Core Team, 2018) using UPGMA as agglomerative algorithm. The final dendrogram was visualized with the R package factoextra version 1.0.5 58 .
Globally optimized eBURST algorithm (goeBURST-Phyloviz). The software Phyloviz 2.0 59 was used for the goe-BURST analysis and to build the MST, a tree in which the sum of the distances among all the isolates, as represented by data, is the shortest possible.