Genetic diversity and phylogeny of South African Meloidogyne populations using genotyping by sequencing

Meloidogyne species cause great crop losses worldwide. Although genetic host plant resistance is an effective control strategy to minimize damage caused by Meloidogyne, some resistant genes are ineffective against virulent species such as Meloidogyne enterolobii. Detailed knowledge about the genetic composition of Meloidogyne species is thus essential. This study focused on genotyping-by-sequencing (GBS) and Pool-Seq to elucidate the genetic relation between South African M. enterolobii, M. incognita and M. javanica populations. Hence, 653 common single nucleotide polymorphisms (SNPs) were identified and used to compare these species at genetic level. Allele frequencies of 34 SNPs consistently differed between the three Meloidogyne species studied. Principal component and phylogenetic analyses grouped the M. enterolobii populations in one clade, showing a distant relation to the M. javanica populations. These two species also shared genetic links with the M. incognita populations studied. GBS has been used successfully in this study to identify SNPs that discriminated among the three Meloidogyne species investigated. Alleles, only occurring in the genome of M. enterolobii and located in genes involved in virulence in other animal species (e.g. a serine/threonine phosphatase and zinc finger) have also been identified, accentuating the value of GBS in future studies of this nature.

Root-knot nematodes (Meloidogyne) are polyphagous, obligate pests that are distributed worldwide and parasitize almost all the higher plant species, resulting in great economic losses 1 . Meloidogyne incognita is generally considered as the most damaging root-knot nematode species worldwide 2 . Since this species can infect Arabidopsis thaliana, it is also a key model system to study metazoan adaptations to plant parasitism, hence its genome has already been elucidated 3 . However, Meloidogyne enterolobii listed as a threat species, can be confused with M. incognita and other thermophilic species due to it exhibiting similar morphological characteristics 4 . Of more significance is that M. enterolobii has the ability to overcome resistance genes that are effective against its thermophilic counterparts Meloidogyne arenaria, M. incognita and Meloidogyne javanica [5][6][7][8] . Meloidogyne enterolobii for example had been reported to reproduce optimally on tomato and pepper that exhibit the Mi-1, N and Tabasco resistance genes, respectively 8 , while M. arenaria failed to reproduce on such resistant plants 9 . This phenomenon has far reaching implications for the management of this species.
Management of root-knot nematodes has been done traditionally by means of chemical control. This approach generally keeps the nematode population under the economic threshold level since eradicating these pests is considered impossible 4,10 . However, the development of resistance against the different chemical compounds and the progressive withdrawal of synthetically-derived nematicides due to animal, human and environmental concerns 11,12 are the main drives for the exploitation and use of environmentally-friendly strategies. Currently, genetic host plant resistance is a very effective and viable strategy to control root-knot nematodes in various agricultural cropping systems 13 . Nonetheless, the existence of virulent root-knot nematode populations reduces the efficacy of this strategy 14 . Meloidogyne enterolobii occurs in many countries and has initially been reported from the Mpumalanga Province in South Africa during the 1990s from guava (Psidium guajava) orchards 15 . Its established occurrence in South Africa fits the hypothesis of the late Dr Kent Kleynhans and Mr Piet Willers that the occurrence and host range is wider than the initial localities and hosts around Mbombela, Mpumalanga (personal communication, Dr Kent Keynhans, Agricultural Research Council-Plant Protection Research Institute, Pretoria, 1998). It has hence been reported from other crop production areas, infecting green pepper (Capsicum annuum); potato (Solanum tuberosum) and tomato (Solanum lycopersicum) 16,17 . This scenario justified a more detailed genetic study of South African Meloidogyne populations to determine if genomic differences linked with virulence exhibited by M. enterolobii could be found between M. incognita and M. javanica.
Several studies have been conducted to elucidate the genetic diversity of Meloidogyne populations by using different molecular techniques, e.g. random amplified polymorphic DNA (RAPD), restriction fragment length polymorphisms (RFLP), PCR based on sequences of rDNA, mtDNA, ITS and IGS, or satellite DNA probe markers [18][19][20][21][22][23][24][25][26] . However, most of these methods are expensive, time-consuming, require several PCR analyses and many nematode individuals 27 . The ultimate drawback of these techniques are that they are targeting only a small part of the genome of a nematode and are hence not optimal for pan-genomic comparison. It is therefore necessary to apply novel and rapid molecular genotyping tools to obtain more detailed information about the genetic diversity between Meloidogyne species.
Single nucleotide polymorphisms (SNPs) are popular and common molecular markers used to study the entire genome of nematodes 28,29 . Significant advances in sequencing technologies are providing lots of information at relatively low cost 30 . Genotyping by sequencing (GBS) 31 is a simple protocol based on next generation-sequencing (NGS) of genomic fragments of organisms (e.g. nematodes) obtained by specific restriction enzymes followed by a bioinformatics pipeline 30 . This enzyme-based reduction of complexity, combined with the use of barcodes for multiplexing considerably reduces sequencing cost while providing genome-wide information 32 . This technique has proven to be useful and accurate to characterize nematode species even when no information about the genome was available 27 . Currently, good reference genomes are available for M. hapla 33 and M. incognita 3,34 . However, no annotated reference genomes exist for M. enterolobii and M. javanica, although some assemblies from whole genome sequencing (WGS) data were published recently 34,35 . Genotyping by sequencing has, for example successfully been applied in combination with Pool-Seq by Mimee et al. 27 , to investigate the genetic diversity among populations of the golden cyst nematode Globodera rostochiensis. Pool-Seq is a method described by Futschik and Schlötterer 36 , which instead of sequencing isolated individuals directly uses a population (several individuals pooled together). When using a sufficiently big pool size, Pool-Seq even showed to be more appropriate for estimating allele frequencies and is more cost effective than sequencing the DNA of individuals 36 .
This study aimed to investigate the genetic diversity of three different Meloidogyne species viz. M. enterolobii, M. incognita and M. javanica using GBS in order to highlight relationships among these species and loci putatively involved in virulence.

Results
SNP calling. The sequencing of 11 Meloidogyne populations digested with PstI/MspI restriction enzymes generated 83 038 291 reads. After initial quality control, 77 095 925 good barcoded reads were kept for further analysis. The UNEAK pipeline identified 2,786 SNPs before filtering. The final dataset contained from 59 to 929 SNPs depending on filtering stringency (Table 1). In order to ensure a good accuracy, the dataset without missing data containing 277 SNPs (minimum call rate = 1.0 and, minimum coverage = 20) was kept for phylogenetic and Principal Component Analyses (PCA). The dataset, containing 653 SNPs, was explored to find interesting markers. The allele frequencies at these loci in the 11 Meloidogyne populations are presented in Supplementary Table 1. When the pipeline was run on the M. enterolobii populations only, 13,047 SNPs were identified. Of these, 2,092 were present in all populations and supported by a minimum coverage of 20 reads (Table 1).
A total of 1,016 variants were called after alignment of the raw reads to the M. incognita genome and 419 of them were in predicted genes. However, very few remained after filtering.  explained 89.7% of the variation in the dataset (Fig. 1A). Although the two populations of M. incognita (R25 and R34) exhibited different genetic relation on this first dimension, this species was clearly separated from M. enterolobii and M. javanica in the second dimension of the PCA explaining 4.1% of the total variation (Fig. 1B). The phylogenetic tree, using the dataset of 277 SNPs, also revealed two main clusters separating M. enterolobii populations from those of M. javanica, with M. incognita populations being intermediate (Fig. 2).
A direct comparison of allele frequencies between species highlighted several SNPs clearly differentiating the three species. Among these, three had different zygosity status between the species in all the populations tested and 31 were homozygous for an allele only in M. enterolobii ( Table 2). The sequence surrounding each SNP has been retrieved from the M. incognita reference genome. Out of these, 19 were located in predicted genes. When screening the variants obtained by aligning the raw reads to the M. incognita genome, 14 were located in genes coding for 10 different proteins and had allele frequencies specific to M. enterolobii (Table 3).

Discussion
This study represents a baseline investigation of the genetic diversity of South African Meloidogyne populations. Root-knot nematodes are reported to infect various crop hosts and to cause great damage and economic losses in South Africa. Since M. enterolobii is known to be highly virulent, and because resistant cultivars are not equally effective against the different Meloidogyne species, accurate and reliable species identification is crucial to use appropriate management strategies.
The GBS method used in this study proved to be useful in identifying diagnostic SNPs for the discrimination of three of the highly damaging thermophilic Meloidogyne species occurring in South Africa. Accurate distinction of M. enterolobii, M. javanica and M. incognita was still challenging using various molecular techniques, except for SCAR-PCR 20 . Although M. enterolobii can be separated from other Meloidogyne species by the use of various universal markers e.g. 28S, COI, 16S and IGS 24,37,38 , distinguishing between M. javanica and M. incognita has been unsuccessful in various studies due to the high genetic similarity among the latter species and M. arenaria [37][38][39][40][41] . In this study, however, several putatively discriminative SNPs to M. javanica, M. incognita and M. enterolobii were identified and will enable the accurate distinction between these species through the development of allele-specific PCR. The good number of SNPs obtained with UNEAK on the M. enterolobii populations alone revealed the high potential of this method and indicated that the technique will be very useful to compare more populations of this species and to include co-variables like virulence or origin for a more in-depth characterization. Ultimately, this kind of comparison should be done using whole genome sequencing. However, when dealing with numerous samples, the cost associated with WGS is still prohibitive. Thus, GBS represents an interesting alternative. This molecular approach was initially described for plants by Elshire et al. 31 , and modified specifically for nematodes by Mimee et al. 27 . It was used for the first time during this study to investigate the genetic diversity in Meloidogyne species. The technique also takes advantage of Pool-Seq (sequencing of composite samples) which removed the fastidious step of isolating and extracting DNA from single juveniles. The bioinformatics pipeline allowed the rapid de novo identification of SNPs, without the need of a reference genome. When aligning the sequencing reads to the closest reference species, M. incognita, we found less good quality SNPs (no missing data and good coverage) than when using the GBS pipeline. This indicates that significant differences exist between the two species. This was confirmed by the eight-fold increase in the number of SNPs when the pipeline was run with M. enterolobii populations alone. As UNEAK will only keep the SNPs that are present in all populations (at minimum call rate = 1.0), this approach indicates that the three species are probably more different than anticipated. On the other hand, some genetic variants were found upon alignment with the M. incognita genome and not by the UNEAK pipeline. This is explained by the high stringency of the pipeline that only tolerates one mismatch by read and rejects sequences with multiple SNPs or other kind of variants. Therefore, combining the two approaches will maximize the discovery rate of SNPs in the genome of Meloidogyne species. Also, it was hypothesized that several mitotic parthenogens Meloidogyne species had acquired pairs of divergent gene copies during past hybridisation event 35 . This will result in an excess of heterozygosity that could complicate classical phylogenetic analyses. One of the advantage of GBS is that it is not affected by ploidy as the UNEAK pipeline compare all the sequences. Thus, if the same sequence is present in many versions in an organism due to ploidy or gene duplication, the pipeline will compute all versions together and output the allele frequency for that sequence and not for each physical locus in the genome. Furthermore, using Pool-Seq, we theoretically captured all the allelic diversity of each population.
PCA analyses confirmed the genetic separation between M. enterolobii, M. javanica and M. incognita. This result was expected since the distinction of these species has been reported using different markers 24,37 . The analysis also revealed a close genetic proximity of the four M. enterolobii populations when the 277 SNPs dataset (present in all four populations) was used. This is in agreement with results obtained by Tigano et al. 22 that described M. enterolobii as a geographic homogenous species with low diversity between populations. Similarly, Onkendi and Moleleki 38 showed 100% homology between sequences of IGS and COII from South African M. enterolobii populations. However, when the GBS approach was used for M. enterolobii only, more variation was observed and suggests that this species could be more diverse than initially thought. As genetic diversity is an important driver of adaptation to new environmental conditions or host plants, it will be interesting to explore this diversity by studying more M. enterolobii populations.
When comparing allele frequencies of 277 SNPs distributed all across the genome, the two populations of M. incognita investigated in this study showed substantial genetic difference, one being related to M. enterolobii and the other to M. javanica. A possible explanation for this phenomenon may be that in this analysis only loci sequenced in all Meloidogyne populations were conserved, meaning that the sequences that were unique to one or two species only (not present or not sequenced in the other) were discarded even if they contain SNPs. Thus, this place the emphasis on the differences that exist in terms of SNPs contained in the genomes of these three Meloidogyne species. This study also highlighted allelic differences between the species that exists in specific genes that could modify protein sequence and function. Ultimately, in depth characterization of the M. enterolobii genome will give us a better understanding of why the species is more aggressive and overcome resistance genes that are effective against M. incognita and M. javanica. For now, 19 SNPs identified by UNEAK and 14 by the alignment on the M. incognita reference genome were located in exons of predicted genes. These variants were specific to M. enterolobii and could affect nematode-host interactions. Some of these genes have already been reported to be involved in parasitism. For example, two SNPs were located in genes coding for a serine/threonine phosphatase. A protein with a similar sequence is a known effector in the Hessian fly, Mayetiola destructor. This plant-galling arthropod uses effectors to modify host cells in a way that is superficially similar to root-knot nematodes. The type-2 serine/ threonine protein phosphatase (PP2C) domain was shown to be associated with the ability of the fly to survive and parasite wheat seedlings in susceptible plants. This protein is also recognized as an avirulence factor in cultivars carrying the H24 resistant genes. Interestingly, a single loss-of-function mutation in that gene is sufficient to overcome resistance in virulent population 42 .
Results from this study also highlighted differences between M. enterolobii and the other species in a sequence coding for a zinc finger, C2H2 domain. These small protein motifs are known to interact with different molecules and are involved in multiple functions from gene transcription and mRNA trafficking to protein folding and apoptosis 43 . Wang et al. 44 , reported that the C2H2 zinc finger PsCZF1 was involved in pathogenesis in Phytophthora sojae by demonstrating that PsCZF1-deficient mutants lost virulence on different soybean cultivars. Another study on Alternaria brassicicola showed that a knockdown mutation in the zinc finger transcription factor AbVf19 resulted in a 90% decrease in virulence 45 . Interestingly, Gross and Williamson 46 identified a novel transposable element in root-knot nematodes that contained a C2H2 zinc-finger motif and that could be involved in the ability of some populations to bypass resistance in tomato mediated by the Mi-1 gene.  Finally, the plant cytoskeleton is hypothesized to play a crucial role in host defense response and a target for virulence 47 . Root-knot nematodes are actively remodeling the cytoskeleton of infected plants during feeding cell and gall formation 48 . This reprogramming is induced by secreted effectors, several being homologous to plant proteins 49 . In this study, we have identified mutations in several M. enterolobii genes that code for proteins required for microtubule and cytoskeleton formation: Cytoskeleton-associated 5, spectrin and plectin repeat.
In this paper, the GBS technique proved to be a powerful and accurate technique to obtain detailed information about the diversity that exists among root-knot nematode species. The bioinformatic pipeline allowed the identification of high-quality diagnostic SNPs from the South African Meloidogyne species. The phylogenetic comparison of these variants generated valuable and novel knowledge about the genetic diversity of Meloidogyne species. Candidate genes associated with virulence were also highlighted and should be further explored to evaluate whether they are involved in M. enterolobii pathogenicity. Species identification. All populations of Meloidogyne species used in this study were identified using morphological and molecular approaches. Morphological identification was done based on female oesophagus and perineal patterns. The molecular confirmation was done by sequencing the 28S rDNA (D2-D3) 51 , COI 52 and NADH5 37 mtDNA, regions and comparing to NCBI database, and by the amplification of a species specific SCAR marker 20,53 . Library preparation and sequencing. Sample preparation and sequencing were done by the Genomics Analysis Core Facility at the Institute for Integrative and Systems Biology (IBIS; Université Laval, Quebec, Canada) according to the GBS method developed by Elshire et al. 31 , using PstI/MspI restriction enzymes. The library was sequenced on one Ion Proton chip (ThermoFisher Scientific) at IBIS.  Raw sequences were also aligned on the M. incognita genome 3 by using burrows-wheeler aligner (BWA) 55 after trimming with Trimmomatic (TRAILING = 20, MINLEN = 20) (http://www.usadellab.org/cms/?page=trimmomatic) and demultiplexing with Sabre (https://github.com/najoshi/sabre). Then, variants and annotations were called with freebayes 56 and SnpEff 57 respectively using the gene-finding format (GFF) of M. incognita.

Population genetics.
A neighbor-joining phylogenetic tree was elaborated by using Gendist and Neighbor modules in PHYLIP v3.695 with the genome-wide allele frequencies obtained from UNEAK. PCA was done in R by using the same sets of allele frequencies with the prcomp() function from the stats package.
The SNPs of interest from UNEAK were retrieved from the M. incognita genome by means of BLASTN with the default parameters, except for a smaller word size of 4, with the Blast2GO application 58 .

Data Availability
The data are submitted to NCBI SRA Portal with the following information. BioProject (PRJNA485255) and accession number (SAMN09786892, SAMN09786893, SAMN09786893, SAMN09786895, SAMN09786896, SAMN09786897, SAMN09786898, SAMN09786899, SAMN09786900, SAMN09786901 and SAMN09786902).  Table 4. Root-knot nematode species used in this study as well as the origin of the species and host plants which it infected.