Introduction

Chickpea (Cicer arietinum L.) is one of the valuable cool-season grain legume crops in the world. It is a self-pollinated and diploid plant (2n = 2x = 16) with a genome size of ~ 740 Mb1 which is considerably less than other important legume crops like pea, lentil, alfalfa, soybean and peanut2. The genus Cicer L. belongs to the family Fabaceae, subfamily Faboideae and contains a total of 49 taxa with 9 annuals and 40 perennials3,4,5,6. Toker et al.7 has been recently introduced a new annual wild Cicer species, thereby increasing the count to 10 annual species. C. arietinum is solely cultivated species of the genus. C. reticulatum is considered to be the wild progenitor of the cultivated chickpea8. It is crossable with the cultivated chickpea and possesses 2n = 2x = 16 chromosommes with a smaller genome size of 416 Mb than that of the cultivated chickpea9.

Chickpea plays valuable roles in human diet as a rich source of dietary proteins, complex carbohydrates and micronutrients such as iron, potassium and zinc as well as vitamins A and B in addition to folate and thiamine10. Because of its capacity of biological fixation of atmospheric nitrogen through nodulation with Rhizobium species, it is an advantageous crop in crop rotation11. Also, chickpea is the most important cool season food legume in the arid and semi-arid areas under rainfed conditions12. Globally, harvested area was approximately 14.8 million ha and total production was almost 15.1 million tons of chickpeas in 202013. It is widely grown and consumed in India, Pakistan, Iran and Turkey13.

Various biotic and abiotic factors have been affecting the chickpea production in the worldwide14,15. Due to limited genetic diversity in cultivated chickpea, it has been restricted achievement in respect to efforts for increasing the productivity16. Conventional methods have been used in crop breeding and tolerance to the environmental stresses while molecular breeding approaches have potential to accelerate the process of developing new cultivars. Also, the effective usage of plant genetic resources in breeding might be possible with the awareness and information of genetic variation present within individuals or populations.

Molecular markers explore the genetic diversity at the DNA level and have the capability to reflect the precise genetic diversity between genotypes17. In chickpea, random amplified polymorphic DNA (RAPD)18,19,20, amplified fragment length polymorphism (AFLP)21,22, simple sequence repeat (SSR)23, inter simple sequence repeat (ISSR)24,25,26 and internal transcribed spacer (ITS)27 have been used for genetic diversity analysis in different germplasm. Recently, an extensive development has been made regarding the improvement of several genomic or transcript-based SSR markers and SNP markers and their deployment in the large-scale genomics and breeding programs in chickpea28,29,30,31,32,33,34,35. In contrast to SNP markers, SSRs are very convenient and easy to use. SSRs can be found in both coding and noncoding regions of all higher organisms. The genome wide occurrence, co-dominant inheritance, highly polymorphic and multi-allelic nature promote wide utilization of SSRs36,37,38. Earlier, the usual protocol for isolation microsatellite sequences was utilization of microsatellite-enriched libraries by cloning and Sanger sequencing method, which was costly, difficult, and time consuming39.

Recently, development of next-generation sequencing (NGS) technologies has prompted the fast and cost-effective SSR discovery in many crops. There are now numerous methods that apply NGS for genotyping, reduced representation libraries (RRLs), restriction-site-associated DNA sequencing (RADseq), genotyping-by-sequencing (GBS), whole-genome resequencing (WGRS)40,41,42. WGRS is more appropriate for pre-breeding activities where less number of elite parents, landraces and wild species require to be examined delicately for genome variation (SNPs, CNV, structural variation) and association studies43. Efficiency of WGRS have been shown in many such crops such as rice44,45, sorghum46, cotton47, soybean48, tomato49, and chickpea50,51,52,53. In view of above prospects, genome-wide SSR markers were developed in chickpea in the present study. The utility of these developed markers in F6 population derived from an interspecific cross between C. arietinum and C. reticulatum was accessed. The cross-transferability of these markers was also examined across 30 chickpea genotypes including cultivated and wild types.

Results

Genotyping

A total of 2.01 GB and 2.16 GB raw sequence reads of C. arietinum and C. reticulatum were generated from 150 bp paired-end sequencing. C. arietinum had 34.77 M reads and 33% guanine-cytosine (GC) content while C. reticulatum had 33.60 M reads and 34% GC content. The means of reads mapped to the C. arietinum reference genome were 97.56% and 96.62% in C. arietinum and C. reticulatum, respectively.

Variant detection

Using variant calling pipeline, 3.9 M and 4.7 M variants were initially detected in C. arietinum and C. reticulatum genome, respectively. Out of all variants, a total of 3.26 M SNPs were identified in C. arietinum, by contrast 3.93 M in C. reticulatum compared to the reference genome. In total, 35,329 and 44,331 InDels were identified in the species of C. arietinum and C. reticulatum, respectively. A total of 3387 InDels with 2 bp length was detected in C. arietinum, there was 4704 in C. reticulatum. Among 8091 InDels, 58 di-nucleotide regions that were polymorphic between two species were selected and used for primer design (Table 1).

Table 1 The primer sequences of the 58 SSR markers developed and used in this study.

SSR validation in RIL population

Designed primer pairs were used for validation in 30 chickpea genotypes of F6 population obtained from an interspecific cross between C. arietinum and C. reticulatum. Out of SSR31 and SSR32, all primers were successfully amplified. The obtained PCR products were loaded on a polyacrylamide gel, and allele sizes were determined by comparing with C. arietinum and C. reticulatum. The difference of allele sizes was also confirmed in the gel. It was seen that all 30 genotypes carried one of the alleles which the parents had. While SSR5 and SSR10 produced suitable alleles in 30 RIL genotypes for 2-nucleotide polymorphism between female and male parents, SSR14 primer produced suitable alleles for 8-nucleotide polymorphism and SSR18 primer for 6-nucleotide polymorphism between C. arietinum and C. reticulatum (Table 1).

Chi-square (χ2) values were calculated for each marker to test the fit of the markers in 30 genotypes representing the RIL population to the expected 1:1 expression ratio. Markers deviating from expected Mendelian ratios were determined by chi-square analysis (Table 2). According to the results, it was determined that the markers were suitable for 1:1 expansion ratio, since the calculated p values for all markers except SSR20 were greater than 0.05.

Table 2 Chi-square (χ2) values for each marker to test the fit of the markers in the RIL population to the expected 1:1 expression ratio.

SSR diversity in cultivated and wild populations

For genetic diversity analysis, 30 genotypes obtained from cultivated and wild species were tested in polyacrylamide gel, bands were scored according to allele sizes. As a result of the analysis, a total of 244 alleles belonging to 41 different SSR loci were determined in 30 chickpea genotypes (Table 3). At the population level, allelic diversity in cultivated and wild populations was shown in Fig. 1. Total allele distribution was 63 in cultivars and 311 in wild genotypes. While a total of 110 alleles were determined in the genotypes of the C. reticulatum, 112 alleles were observed in the genotypes of the C. echinospermum. 89 alleles were determined in the population from distantly related wild species. The mean number of alleles (Na) for 30 genotypes was 2.36 (Table 3). The highest number of alleles was obtained from the primers SSR3, SSR58 and SSR39 (Table 3). The number of effective alleles (Ne) varied between 0.75 and 3.74. Nei's54 observed (Ho) and expected (He) heterozygosity values were calculated as 0.08 and 0.34, respectively. The mean of polymorphism information content (PIC) was measured as 0.73 (Table 3). The highest PIC value was observed at the SSR21 (0.90) loci, followed by the SSR56 (0.88), SSR54 (0.86), SSR4 (0.85), SSR7 (0.83) and SSR34 (0.83) loci. The lowest PIC value was found in the SSR9 (0.51) locus (Table 3).

Table 3 Summary of genetic diversity statistics for 30 chickpea genotypes.
Figure 1
figure 1

Allelic patterns and gene diversity across cultivated and wild populations. The figure shows comparison for number of alleles (Na), Number of alleles with frequency more than or equal to 5%, Number of effective alleles (Ne) and Number of private alleles, etc.

Phylogenetic tree consisting of 30 chickpea genotypes was constructed based on the UPGMA clustering method with newly developed SSRs (Fig. 2). The chickpea genotypes were divided into four clusters, indicating clear separation between wild and cultivated species. Cluster I contained cultivated chickpeas including four kabuli and four desi chickpeas. Cluster II, III and IV consist of wild chickpea species, each representing C. echinospermum, C. reticulatum and other wild chickpea species, respectively.

Figure 2
figure 2

UPGMA based dendrogram generated using SSR markers and 30 wild and cultivated chickpea genotypes.

The PCoA analysis confirmed the clusters of the phylogenetic tree (Fig. 3). Cultivated and wild genotypes did not cluster together. The two informative components explained 92.36% of the cumulative variance, PC1 and PC2 shared 53.72% and 38.64% variation, respectively.

Figure 3
figure 3

Principal coordinate analysis (PCoA) of the 30 chickpea genotypes genotypes with SSR markers.

Discussion

Using NGS technology is an effective tool for the identification of SSR markers

SSRs are valuable genetic markers due to their co-dominant inheritance, multi-allelic and reproducible nature55. In chickpea, large numbers of SSR markers have been identified and widely used for genetic diversity analysis, gene/QTL mapping, construction of linkage map, marker assisted selection (MAS)33,56,57,58,59. However, validation and selection of informative markers from such huge numbers of markers that show polymorphism in chickpea, is an excessive effort. In addition, the narrow genetic base in chickpea may can restrict use of the identified markers in genotyping studies because of their low intra-specific polymorphism among chickpea genotypes23,30. The NGS technologies have caused impressive advances in sequencing which creates high-throughput sequences to transform genotyping and plant breeding. It provides opportunities to perform high-throughput SSR identification. In present study, we developed genome-wide SSR markers from cultivated and wild chickpea genotypes. SSR marker development from genomic data has been reported for various crops such as sesame60, red clover61, peanut62, sweet potato63, faba bean64, lentil65.

Distribution of variants in C. arietinum and C. reticulatum genome

As a result of alignment to the reference genome of chickpea, a total of 3.26 M SNPs were identified in C. arietinum, by contrast 3.93 M in C. reticulatum. Previously, 51,632 SNPs were reported by 454 transcriptome sequencing of C. arietinum and C. reticulatum genotypes35. In addition, couple hundreds of SNPs were also studied using Solexa ⁄ Illumina sequencing, targeted amplicon sequencing, mining of expressed sequence tag libraries and sequencing of candidate genes30,66,67.

Validation and polymorphic potential of SSRs

The utilization of genetic diversity in chickpea genetic resources is very important in order to utilize collections and improve breeding studies. Genetic diversity analysis in chickpea was previously performed using RAPD18, AFLP68, STMS69, SSRs70,71. In this study, the effectiveness of the developed markers was evaluated in 30 chickpea genotypes obtained from cultivated and wild species as well as 30 chickpea genotypes of F6 population obtained from an interspecific cross between C. arietinum and C. reticulatum. The markers were effective for detection of a total of 244 alleles (Na). The mean of number of alleles (2.36) observed in this study are within the ranges revealed by various previous studies. For instance, the use of 33 SSR markers identifed a total of 111 alleles with an average of 3.7 alleles per locus in 155 chickpea genotypes72. Similarly, 27 SSRs were used to study genetic diversity in 50 chickpea accessions which reported a total of 81 alleles with an average of 3.0 alleles/locus73. In the present study, heterozygosity was detected in genotypes that ranged from 0.03 to 0.66 with mean of 0.34, which is similar to previous studies reported previously by Upadhyaya et al.74 and Hajibarat et al.75. Genetic diversity analysis showed that the average PIC value of SSR markers was 0.73, higher than PIC value of the SNPs76, STMS77,78, AFLP20 and SilicoDArT79 markers used to identify genetic variation in chickpea. Botstein et al.80 reported the PIC values of markers as highly informative (≥ 0.5), reasonably informative (0.50–0.25), or least informative (≤ 0.25). Our average PIC value (0.73) thus shows that the developed markers identified here are highly informative and greatly sufficient for showing relationships among genotypes, according to Meszaros et al.81. The principal coordinate analysis clearly separated the whole population into four clusters, and wild and cultivated types in seperate clusters. Results from the present study are consistant with the previous studies71,82 the grouping followed a clear pattern between cultivated chickpea and the wild species. It is also clear as the wild progenitor, Cicer reticulatum showed close proximity with the cultivated chickpea. The other close connection was seen between C. reticultum and C. echinospermum. It can be supposed from this study that cluster analysis shows the effectiveness of the designed markers.

The results of the present study revealed the success of SSR identification and marker development in chickpea using NGS genome data. The developed SSR markers were applied successfully for illuminating genetic diversity among cultivated and wild chickpea populations as well as validation in F6 population obtained from an interspecific cross between C. arietinum and C. reticulatum. Therefore, newly developed 58 SSR markers are potentially useful for genetic studies of chickpea.

In conclusion, NGS strategy led to the discovery of a large number of microsatellites markers, providing thousands of SSRs for validation in chickpea. These new SSRs will become significant molecular tools for chickpea genetic breeding programs. Later, these markers could be integrated in genetic maps to be utilized in MAS.

Materials and methods

Plant material

C. arietinum L., CA 2969 and C. reticulatum Ladiz., AWC 602 were used as a genetic material for WGRS analysis. CA 2969 and AWC 602 chickpea genotypes were registered by USDA-ARS and Akdeniz University, Department of Field Crops, respectively. The important traits for these genotypes were given in Table 4. Developed SSRs were validated in 30 chickpea lines from a RIL population earlier developed by Sari et al.83 and derived from an interspecific cross between CA 2969 and AWC 602. The markers were also used to assess the genetic diversity of cultivated and wild chickpea accessions including eight accessions of C. arietinum (four kabuli and four desi chickpeas), eight accessions of C. reticulatum, eight accessions of C. echinospermum P.H. Davis and six accessions of C. anatolicum Alef., C. canariense A. Santos & G.P. Lewis, C. microphyllum Benth., C. multijugum Maesen, C. oxyodon Boiss. & Hohen. and C. songaricum Steph ex DC. (Table 5). Seed samples of ICARDA and USDA are available directly from ICARDA (https://www.icarda.org/) and USDA (https://www.usda.gov/). The procurement of seeds of all cultivated and wild genotypes used in the present study complies with relevant institutional, national, and international guidelines and legislation.

Table 4 Important morphological and the specific-known traits of the parents used for WGRS analysis (*Chrigui et al.15).
Table 5 Cultivated and wild Cicer species.

Experimental area

Plants belonging the parents (CA 2969 and AWC 602) and 30 cultivated and wild chickpea accessions were grown in separate pods in a greenhouse at the Faculty of Agriculture, Akdeniz University, Antalya, Turkey (30°38′E, 36°53′N, 33 m above sea level) for genomic DNA extraction.

DNA extraction

DNA extraction process was carried out at Plant Molecular Biology Laboratuary, the Faculty of Agriculture, Akdeniz University, Antalya, Turkey. Genomic DNA was extracted from 3 week-old young leaves of plants individually using the CTAB method as described by Doyle and Doyle84 with minor adjustments such as extra chloroform-isoamyl alcohol and 70% ethanol cleaning steps. DNA quality and quantity of each sample were estimated by electrophoresis on 1% agarose gels, and the amount was fixed to 100 ng/μL using lambda DNA as a reference.

Library preparation and sequencing

The genomic data from C. arietinum and C. reticulatum was used for construction of a HiSeq sequencing library using TruSeq DNA sample Prep kit LT, (set A) FC-121-2001 (Illumina, San Diego, CA, USA) according to manufacturer’s protocol. A reduced representative genomic library with a target insert size of about 350 bp were sequenced on Illumina Hiseq X to generate 150-bp paired-end reads at Macrogen Inc., (Macrogen, Seoul, Korea). WGRS data of two available genotypes were deposited into the National Center for Biotechnology Information (NCBI) Sequence-Read Archive (SRA) database.

The raw data were demultiplexed using Je V1.285, a quality control was performed for FASTQ Sanger files using fastp86, and reads with a Phred quality score below 15 were trimmed87. The cleaned data were aligned with kabuli reference genome 1.01 using Bowtie 2 with default parameters88 in the Galaxy software (www.usegalaxy.org). The created BAM files (*.bam) were analyzed using Freebayes (Galaxy Version 1.1.0.46-0)89, with simple diploid calling and filtering, and a minimum of 20 × coverage for variant detection. The obtained variant files were filtered using VCFfilter (Galaxy Version 1.0.0) and SNPs were chosen. Insertions and deletions from individual (*.vcf) files were later merged into a single VCF file using VCF genotypes (Galaxy Version 1.0.0).

The combined variant file was processed using Microsoft Excel to eliminate duplicated regions and organize the SSRs according to their sizes. SSR regions which have 2 bp long and polymorphic between parents were checked using the Integrated Genome Browser V9.1.4.

Primer design

For designing the primer pairs from the flanking sequences of identified SSRs, Primer3 software90,91 was used with the parameters as follows: primer length of 18–27 nucleotides, melting temperatures of 55–65 °C, GC content of 30–70%, and predicted PCR products of 100–300 bp in length. The primer pairs were later controlled for possible duplication of sequences in the genome using IGB software.

The PCR reactions were performed using the M13 tailing PCR procedure92. The forward primers were tailed by adding an M13 sequence labeled with IRDye to the 5′ end. The following PCR protocol was applied: 95 °C initial denaturation for 5 min, 30 cycles at 95 °C for 30 s, annealing temperature 60 °C for 30 s, 72 °C for 1 min, followed by 9 cycles of 95 °C for 30 s, 55 °C for 30 s, 72 °C for 1 min, and then a final extension of 10 min at 72 °C. PCR products were loaded onto 8% denatured polyacrylamide gel and separated by 4300 DNA analyzer (LI-COR, Inc., Lincoln, Nebraska, USA). 1 kb size marker was used to score markers as 1 or 0 for the presence and absence of alleles.

Statistical analyses

RIL data was analyzed using MINITAB 19 software. A Chi square (χ2) test was used to assess goodness of fit to the observed segregation ratios followed 3:1 ratio in the RIL population.

Genetic diversity and phylogeny analysis

Genetic diversity parameters such as number of alleles (Na), number of effective alleles (Ne), Shannon diversity index (I), expected heterozygosity (He), unexpected heterozygosity (uHe), observed heterozygosity (Ho) and Wright’s fixation index (F) were shown using GenAlEx 6.593. The phylogenetic tree was constructed in DARwin ver 5.0 software94 using the unweighted pair group method with arithmetic mean (UPGMA)95 clustering method and modified in FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree). Principal coordinate analysis (PCoA) was performed with GenAlEx 6.5 to evaluate the genetic relationships between populations. The Excel microsatellite toolkit96 was used to measure polymorphism.