Introduction

Mango (Mangifera indica L., Family: Anacardiaceae, Order: Sapindales), is one the most important fruit crops, and referred to as ‘King of fruits’ in the tropical world1. Endowed with the largest mango gene pool around the globe2,3,4,5, with a high degree of diversity, India is considered to be the centre of origin of mango from where it is believed to have spread progressively to tropical and subtropical provinces of the world. India's eco-geographic diversity has been responsible for a good number of fruit crop type; no wonder that India ranks second in global fruits production, next to China6. The leading mango producing States of India are Uttar Pradesh (23.06%), Andhra Pradesh (16.06%), Karnataka (9.29%), Telengana (8.54%), Bihar (7.51%), Gujarat (6.30%), Tamil Nadu (5.87%), West Bengal (4.24%) and Odisha (4.14%)7. The wild as well as cultivated forms of mangoes (Local, Hybrid and Selection) in India exhibit a wide diversity of fruit forms, flavour, size and taste8.Unfortunately, at present, only a meagre number of about 20 landraces are commercially cultivated, majority of them being restricted to specific regions9.

Basal to species diversity is the genetic diversity, which is considered as a key precursor in studying a species, because the magnitude and range of heterogeneity in populations of such species influence its evolutionary potential10. Determining genetic diversity is critically useful so that elite genotypes are identified, multiplied and conserved11. Conservation of mango genetic resources is crucial to the long-term survival, sustainable production and genetic improvement of the commercially profitable genotypes. Assessment of the genetic diversity level and genetic structure of a populations of a species helps ascertain its current status and threats; thus, could provide a basis for adopting appropriate scientific management policies and devising effective conservation strategies12. It is essential to preserve genetic diversity to promote adaptability of the populations to changing environment as well as to preserve a large gene pool for future genetic improvement13, the latter achievable through either conventional hybridization or molecular breeding. Knowledge about the genetic background of the parents is a necessary prelude to develop new varieties endowed with superior fruit features and more adapted to the changing climatic conditions. It is also important to assess the phylogenetic interrelationships among cultivated relatives. Local elites, being superior performer genotypes, within the same region sharing the same climatic conditions can be ideal candidates for breeding purposes compared to those available in geographically isolated populations at long distances14. True-to-type propagation for multiplication and conservation of the natural population of mangoes is imperative to preserve the existing diversity in the local population, avert imminent extinction of the elite genotypes available in these localities, and to reduce the risk of loss of desirable features (such as fruit quality, for mangoes) owing to uncontrolled natural outbreeding depression, if any15. The application of reliable and rapid DNA-based diagnostic techniques to discriminate/relate established genotypes and unexplored/under-explored local ones (landraces) is rewarding to improve the efficacy of genotype management for use in breeding and conservation of fruit tree species for production of certified plant material with superior fruit quality16. Besides, documentation in form of molecular atlas, as DNA fingerprints, generated for elite mango genotypes and local landraces hold significance in context of ascertaining genetic authenticity (trueness-to-type) of the conserved germplasm.

Information on the extent and structure of genetic variation in germplasm collections is essential for the effective conservation, efficient management and prospective utilization of biodiversity in any crop species. In addition, knowledge of the genetic and population diversity of germplasm collections serves a solid foundation for crop improvement. It is essential to first define the population diversity within the germplasm to avoid spurious associations while performing association mapping studies17. Population diversity is widely used in conservation biology to quantify relationships and differences among populations18.

Determination of the extent of genetic diversity in fruit crop populations is essential to guide strategies for their conservation and sustainable utilization for genetic improvement. Estimating and understanding of the levels of variability existing within and between the populations do not only facilitate formulation of appropriate conservation strategies, as it reflects the status and survival potential of populations, but also it helps to resolve ecological, taxonomic, phylogenetic and demographic questions of great relevance19,20. Conservation of genetic diversity is extremely vital for the long-term subsistence of a plant species, because loss of genetic variability within populations may significantly decrease adaptability to environmental alteration and increase of disappearance risk21. Understanding the relative significance of specific processes, such as inbreeding, gene flow, genetic drift, and selection that structure diversity within and among populations can deliver means to evaluate future risk of erosion of diversity and to design effective conservation approaches for rare taxa22. Disappearance of a single population would remove any distinctive biological traits that it retains and might eventually lead to varietal elimination21. George et al.23 suggested that examining the extent of genetic diversity within and among populations can help in understanding evolutionary contrivances, such as genetic drift and can serve as an indicator of the extent of gene flow and population divergence.

The use of DNA-based markers offers a potential strategy for genetic analysis at population level. DNA markers are not influenced by environmental conditions and, therefore, can be used to help describe patterns of genetic variation among plant populations24,25. Different PCR-based DNA marker techniques are available for studying plant population genetics. These may be classified as arbitrarily amplified DNA markers, DNA sequence based markers and gene targeted functional markers. Molecular methods differ from each other with respect to important features such as genomic abundance, level of polymorphism detected, locus specificity, reproducibility, technical requirements and cost26. Assessment of genetic diversity within and among population have been reported in horticultural crops, such as wild apricot27, bottle gourd28, apple29 and in other plants, including Christmas orchids30,31, mungbean32, Pinus nigra33,  Populus wulianensis34 etc. to understand spatial and temporal differences between populations. Yet, studies on Indian mangoes are limited to the cultivars of Andhra Pradesh (a State of India) and use of SSR markers alone35. However, no studies have looked at population diversity in a diverse collection of mangoes in India using a range of DNA fingerprinting techniques such as RAPD, ISSR, DAMD, SCoT, CBDP and SSR markers, neither individually nor cumulatively. Comparative studies of different molecular techniques for measuring population diversity have already been performed in apricot36, Ocimum37, bamboo38, mungbean32, etc. However, only a few reports exist pertaining to multiple marker comparison for genetic diversity in mangoes, particularly those of Thailand, Vietnam, and China39 and of Gir forest region of India40. The present investigation is a maiden effort to assess genetic variability within and among populations with geographic affiliation (East India, West India, North India and South India) and those based on fruit status (selection, hybrid, indigenous), encompassing 70 Indian mango genotypes, employing individual and combination of arbitrary (RAPD, ISSR, DAMD), gene-targeted (SCoT, CBDP) and sequence specific marker systems (SSR). Our work was aimed at following objectives: i) to compare the efficacy of various PCR-based markers (used either individually or cumulatively) for analysing genetic diversity among and within populations of Indian mangoes from various geographical locations and fruit status background. ii) to investigate population differentiation based on genetic diversity parameters for further use in population and conservation genetics. iii) to estimate the gene flow among populations.

Results

Individual and cumulative DNA marker-based banding statistics

Of a total of 80 RAPD primers subject to initial screening, 65 primers yielded clear and reproducible patterns of bands ranging in size from 100 to 3000 bp with an average of 15.91 bands per primer. Twenty-five different ISSR primers yielded clear and bright bands of sizes ranging 100–2500 bp, with an average of 18.04 bands per primer (Table 2). Initial screening of 41 DAMD primers identified 23 primers with an average of 16.57 bands per primer. Of 48 different SCoT primers tested 22 were able to produce clear and bright bands of sizes, which ranged between 100–3000 bp with an average of 17.64 bands per primer. A total of 56 CBDP primers were screened, of which 33 primers generated distinct and reproducible banding patterns of different band lengths (100–3000 bp), with an average of 16.03 bands per primer. Of a total 72 SSR primers screened, features and evaluation details resulting from 40 scorable SSR primer pairs were considerably informative. All the tested loci were polymorphic with an average of 4.13 alleles/locus. The representative banding patterns of 70 mango genotypes using OPA 18 (RAPD), ISSR-9 (ISSR), HBV (DAMD), SCoT 8 (SCoT), CAAT-3 (CBDP) and SSR-20 (SSR) primer are given in supplementary online files (Suppl. Figure 1a-f). For cumulative analysis, a total of 113 arbitrary primers (65 RAPD, 25 ISSR and 23 DAMD) were employed, which generated 16.84 average no. of bands/primer across all mango genotypes examined. The combined gene targeted markers (SCoT + CBDP) resulted 16.83 average number of bands/primer. A total of 208 primers yielded 12.60 average bands per primer among all genotypes of mangoes under investigation (Tables 1, 2).

Table 1 Indian mango genotypes of 4 Geographical Populations (East India, West India, North India, South India) and 3 Fruit Status-based Populations (Selection, Hybrid, Landrace) audited.
Table 2 Estimates of comparative performance of individual and cumulative DNA marker techniques applied to selected Indian mango genotypes.

Genetic variability within and among populations and gene flow

Variability across geographical populations

Based on their geographical origin, 70 mango genotypes were grouped into four populations—East India (EI), West India (WI), North India (NI), and South India (SI) (Table 1). Additionally, these genotypes were clubbed into three different populations based on their fruit status as Selection (S), Hybrid (H) and, Local (L) (Table 1). Comparative performances of marker systems used were evaluated to determine the utility of each method either individually or in combination for diversity studies in mango populations based on different criteria which are presented in Tables 2 and 3. The genetic variability parameters, such as the number of alleles (Na), effective number of alleles (Ne), Nei’s genetic diversity (h), Shannon’s information index (I), polymorphic band percentage (PB%), total genetic diversity (Ht), genetic diversity within population (Hs), coefficient of genetic differentiation (Gst) and gene flow (Nm) using different marker systems, either individually or in combination, to assess the genetic variability are presented in Table 3.

Table 3 Mean genetic variability statistics among 4 Geographical Populations (East India, West India, North India, South India) and 3 Fruit Status Populations (S = Selection, H = Hybrid and L = Landrace) of selected Indian mangoes based on individual and cumulative DNA marker analysis.

Co-dominant SSR markers were able to detect highest level of variability, which was demonstrated by the highest mean values of Na (1.76), Ne (1.48), H (0.28), I (0.41) and PB% (76.1), Ht (0.31) and Hs (0.28) compared to other markers. On average, CBDP and SCoT markers closely followed SSR in terms of Na (1.74, 1.71), Ne (1.42), H (0.25), I (0.38, 0.37), PB% (74.10, 71.48), Ht (0.28, 0.27) and Hs (0.25). The RAPD, ISSR and DAMD markers exhibited a comparatively lower mean values for genetic diversity indices as compared to the other markers. Interestingly, Na was the highest (1.68) for RAPD markers among arbitrary markers while it was at par for ISSR and DAMD (1.66).The polymorphic band percentage (PB%) was also higher for RAPD (68.03%) compared to ISSR (66.33%) and DAMD (65.69%). Values for mean effective number of alleles (Ne), total genetic diversity (Ht) and genetic diversity within population (Hs) found to be more for ISSR (1.38, 0.27, 0.23) followed by RAPD (1.37, 0.26, 0.22) and DAMD (1.34, 0.23, 0.20) respectively, while other two crucial genetic diversity parameters, such as Nei’s genetic diversity (H) and mean Shannon’s information index (I) was unvaryingly higher for ISSR and RAPD (0.23,0.34) compared to DAMD (0.20, 0.31). Genetic structure of geographical/fruit status populations was also measured using combination of various arbitrary markers (RAPD + ISSR + DAMD) and gene targeted markers (SCoT + CBDP); the latter had superior entries for genetic diversity parameters, (Na = 1.73, Ne = 1.42, H = 0.25, I = 0.38, PB = 73.22%, Ht = 0.28 and Hs = 0.25) in comparison with the former (Na = 1.67, Ne = 1.37, H = 0.22, I = 0.34, PB = 67.62%, Ht = 0.26 and Hs = 0.22). Hence, the gene targeted markers were more efficient than arbitrary markers for evaluating genetic diversity of different populations. The different diversity parameters indicated that Indian mango germplasm hold a considerably high level of genetic diversity. Among all the markers tested with four different populations, the estimator of population substructure or coefficient of genetic differentiation (Gst) value were more with SSR and DAMD (0.156 and 0.150) indicating that about 15–16% of the total genetic variability resided among the population. These Gst values were associated harmoniously with less levels of gene flow (Nm), of 2.714 and 2.836 with SSR and DAMD, respectively compared to other markers. These Gst and Nm were also varied for RAPD (0.140, 3.071), ISSR (0.104, 4.298), SCoT (0.10, 4.494) and CBDP (0.102, 4.402) markers respectively (Table 3). Mean coefficient of gene differentiation (Gst) value 0.147 indicated that about 15% genetic diversity resided among the populations as revealed by arbitrary markers, which was higher than the diversity informed by gene targeted markers (Gst—0.101). Gene flow estimate by gene targeted markers (SCoT + CBDP) was found to be high (4.432), relative to that of arbitrary markers (RAPD + ISSR + DAMD) (2.907) (Table 3).

Among the four geographic populations, mango genotypes of East India (EI) exhibited the maximum variability. The extent of genetic variability level for East India population was measured in respect of several parameters, such as observed number of alleles (Na), effective number of alleles (Ne), Nei’s genetic diversity (h), Shannon’s information index (I) and percent polymorphism (PB %). Data recorded varied with RAPD (1.88, 1.42, 0.25, 0.39 and 88.05%), ISSR (1.86, 1.45, 0.27, 0.40 and 85.63%), for DAMD (1.84, 1.39, 0.24, 0.37 and 84.31%), CBDP (1.92, 1.47, 0.28, 0.43 and 92.14%), SSR (1.91, 1.58, 0.32, 0.47 and 91.35%), RAPD + ISSR + DAMD (1.87, 1.43, 0.26, 0.40 and 87.76%), and SCoT + CBDP (1.89, 1.46, 0.28, 0.42 and 89.10%), respectively (Table 3). The only exception was SCoT marker which detected the highest diversities for South India population (PB% = 84.51, Na = 1.84, Ne = 1.47, H = 0.28 and I = 0.43). Among the different markers used, all these parameters were recorded highest for SSR (PB% and Na), while CBDP revealed the highest polymorphism level (92.14%) and number of alleles (1.92). The South Indian cultivars also exhibited high value diversity parameters with RAPD (H = 0.24, I = 0.37, PPB = 75.30%), ISSR (H = 0.25, I = 0.37, PPB = 70.69%), DAMD (H = 0.21, I = 0.33, PPB = 74.51%), CBDP (H = 0.26, I = 0.40, PPB = 80%) and SSR (H = 0.29, I = 0.41, PPB = 82.61%) markers. The North Indian and South Indian genotypes showed the moderate values for genetic diversity indices. The observed numbers of alleles were higher than that effective in all populations. Presence of private alleles i.e. number of alleles unique to a single population were observed, the most being generated for East Indian (EI) (3–16) followed by South Indian (SI) (3–6) populations.

Variability across fruit status populations

Population analysis performed on the fruit status unveiled noticeable outcomes. The intra-population genetic diversity study revealed the highest values of Na (1.84, 1.82, 1.80, 1.86, 1.91), Ne (1.42, 1.45, 1.40, 1.48, 1.57), H (0.25, 0.26, 0.24, 0.28, 0.32), I (0.39, 0.40, 0.37, 0.43, 0.47) and PB% (84.46%, 82.18%, 80.39%, 85.71%, 82.61%) by RAPD, ISSR, DAMD, CBDP and SSR markers correspondingly among the genotypes of local population (L). The exception was SCoT marker which uncovered highest Na (1.86), Ne (1.46), H (0.28), I (0.42) and PB% (85.92%) among the genotypes of hybrid population (H) (Table 3). The combined marker techniques, SCoT + CBDP as well as RAPD, ISSR + DAMD also experienced similar pattern for values of Na (1.84, 1.83), Ne (1.46, 1.43), H (0.28, 0.26), I (0.42, 0.39) and PB% (83.89%, 83.19%), with the highest value observed in genotypes of local population (L).

The population of Selection genotypes (S) exhibited higher variability than that of hybrids (H) (Na = 1.80 / 1.71, Ne = 1.44 / 1.42, H = 0.26 / 0.25, I = 0.39 / 0.37, PB% = 79.89 / 71.26) as noticed by ISSR markers. The SSR and CBDP markers experienced a similar trend in terms of variability quantities for selection and hybrid populations. Interestingly, CBDP and SSR markers had higher mean values of Na (1.87, 1.88) across all the populations compared to SCoT and RAPD (1.80), ISSR (1.78) and DAMD (1.74). Average values for other parameters like Ne (1.51), H (0.29), I (0.45), PB% (86.95), Ht (0.32) and Hs (0.30) were highest for SSR. The gene targeted markers CBDP and SCoT had Ne (1.45, 1.44), H (0.27, 0.26), I (0.41, 0.40), PB% (84.05, 80.28), Ht (0.28, 0.27) and Hs (0.27, 0.26), respectively. Considering different arbitrary markers, the mean effective number of alleles (Ne), Nei’s genetic diversity (H), Shannon’s information index (I) and total genetic diversity (Ht) was found to be maximum for ISSR (1.43, 0.26, 0.39, 0.27), followed by RAPD (1.41, 0.25, 0.35, 0.27) and DAMD (1.36, 0.22, 0.34, 0.23), respectively.

Higher PB% values were found in RAPD (80.48) followed by ISSR (77.78) and DAMD (74.51), while another critical genetic diversity parameter, i.e. Hs (genetic diversity within population) was equally higher for ISSR and RAPD (0.26, 0.25) compared to DAMD (0.22). Gene targeted markers had higher levels of diversity in terms of average Na (1.83), Ne (1.45), H (0.27), I (0.41), PB% (82.78), Ht (0.28) and Hs (0.27) than those from arbitrary ones Na (1.79), Ne (1.41), H (0.25), I (0.38), PB% (78.85), Ht (0.27) and Hs (0.25). Gst value was the highest in RAPD (0.115) followed by SSR (0.07), DAMD (0.065), ISSR (0.061), CBDP (0.054) and SCoT (0.053). This indicated the superior ability of the RAPD markers to evaluate the variation among local, hybrid and selection genotypes, whilst the trend for gene flow (Nm) among the populations was the reverse, being the highest for SCoT (8.91) followed by CBDP (8.71), ISSR (7.65), DAMD (7.19), SSR (6.56) and RAPD (3.82).

Analysis of molecular variance (AMOVA)

Variability across geographical populations

Hierarchical AMOVA study was performed to estimate population differentiation among selected mango genotypes. The present study with different six marker systems showed that geographic variation was partitioning between 3–13% and maximum variation was present within geographic populations (87–97%). RAPD data showed 9% variation partitioned among geographic populations, whereas 91% variation was existent within the geographic populations (Fig. 1Aa, Supplementary Table S1). Similarly, ISSR based AMOVA analysis showed 5% and 95% variation among and within populations, respectively (Fig. 1Ab, Supplementary Table S1). DAMD marker based AMOVA analysis accounted for 12% variation among geographic populations and 88% variation within geographical populations (Fig. 1Ac). SCoT marker based AMOVA analysis showed the least variation among populations (3%) and maximum variation (97%) within populations (Fig. 1Ad).With CBDP marker, 4% variation was accumulated within populations while between populations it was 96% (Fig. 1Ae, Supplementary Table S1). Of all the six marker systems used for AMOVA analysis, SSR marker based analysis showed maximum variation among populations (13%) (Fig. 1Af, Supplementary Table S1). The overall ΦPT values for RAPD, ISSR, DAMD, SCoT, CBDP and SSR were 0.086, 0.047, 0.117, 0.029, 0.042 and 0.122, respectively. Comparison of variations between the combination of the markers in AMOVA demonstrated a higher variation within the populations by SCoT + CBDP (96%) and RAPD + ISSR + DAMD (90%), while a lower amount of variation among the populations was observed by them (4% and 10%; ΦPT—0.038 and 0.098 ) (Figs. 1Ag–h, Supplementary Table S1).

Figure 1
figure 1

AMOVA analysis of (A) 4 Geographical Populations (East India = EI, West India = WI, North India = NI, South India = SI) and (B) Fruit Status Populations (S = Selection, H = Hybrid and L = Landrace) of selected Indian mangoes based on individual and cumulative DNA marker systems: (a) RAPD (b) ISSR (c) DAMD, (d) SCoT (e) CBDP (f) SSR (g) RAPD + ISSR + DAMD (h) SCoT + CBDP.

Figure 2
figure 2

Neighbour joining dendrogram for (A) 4 different Geographic Populations (East India = EI , West India = WI, North India = NI, South India = SI) and (B) three different Fruit Status Populations (S = Selection, H = Hybrid and L = Landrace) of selected Indian mangoes, based on Nei’s genetic distance using DNA marker systems: (a) cumulative Arbitrary (RAPD + ISSR + DAMD), (b) cumulative Gene targeted (SCoT + CBDP) (c) SSR.

Variability across fruit status populations

The present investigation developed an alternative AMOVA model with populations nested within local, hybrid and selection genotypes. This showed that 3–11% of variance was segregating among populations and 89–97% within populations taking into account all the marker systems. Data depicted that majority of variations were significant related to differences among genotypes within populations with respect to RAPDs (89%), ISSRs (95%), DAMDs (96%), SCoTs (97%), CBDPs (97%) and SSRs (96%) while among population variance components were very low i.e. 11% (RAPD), 5% (ISSR), 4% (DAMD, SSR), 3% (SCoT, CBDP). Examining the overall ΦPT values for RAPD, ISSR, DAMD, SCoT, CBDP and SSR they were found to be 0.084, 0.052, 0.042, 0.027, 0.028 and 0.039, respectively (Fig. 1Ba-f, Supplementary Table S2). AMOVA based on cumulative arbitrary markers presented a similar trend and about 6% variation among the populations and 94% variation within population were revealed (Fig. 1Bg, Supplementary Table S2). Cumulative gene targeted marker based analysis demonstrated a maximum 97% variation within populations and 3% among populations used for AMOVA analysis (Fig. 1Bh, Supplementary Table S2).

Genetic similarity and cluster analysis among populations

Geographical populations

To illustrate further the phylogenetic relationship between four different geographic populations, the Neighbor-joining method was employed to construct the phylogenetic tree (NJ dendrogram) of SSR, cumulative RAPD + ISSR + DAMD and SCoT + CBDP markers (Supplementary Table S3, Fig. 2Aa-c). To illustrate the phylogenetic relationship among individual 70 mango genotypes, a dendrogram was plotted using the Jaccard’s similarity coefficient of cumulative RAPD + ISSR + DAMD + SCoT + CBDP + SSR markers profile, depicted in supplementary file (Suppl. Figure 2a). A similar pattern was observed in the grouping of genotypes of 70 Indian mango genotypes in 2D and 3D PCA depicted in supplementary files (Suppl. Figure 2b & c). The Nei’s genetic distance and genetic identity both were calculated and represented in below and above diagonals of the Supplementary Table S3. The Nei’s genetic identity ranged from 0.83 to 0.93 for RAPD + ISSR + DAMD, 0.90 to 0.95 for SCoT + CBDP and 0.81 to 0.94 for SSR. The genetic relationship between EI and SI populations was the closest in respect of all the three methods (SSR—0.94, SCoT + CBDP—0.95, RAPD + ISSR + DAMD—0.93), hence they clustered together. For SSR markers, The NJ dendrogram depicted that EI and SI populations (0.94) were grouped together as did the populations of WI and NI (0.91), while the farthest genetic identity (0.81) was between EI and WI populations. In RAPD + ISSR + DAMD based dendrogram, the genotypes of NI population seemed to be closely related to SI and EI pair (NI vs SI—0.91; NI vs EI—0.90), while genotypes from WI [WI vs EI (0.83)/SI (0.88)/NI (0.90)] are slightly distinct from the three populations. The same pattern of clustering was also manifested with SCoT + CBDP, but the identity values were more pronounced (SI vs NI—0.93; EI vs NI—0.92; EI vs WI—0.91, NI vs WI—0.91 and SI vs WI—0.90).

Fruit status populations

Similarly, Neighbor-joining method was employed to construct the phylogenetic tree (NJ dendrogram) among three fruit status populations (L, H, S) using SSR, cumulative RAPD + ISSR + DAMD and SCoT + CBDP markers (Supplementary Table S4, Fig. 2Ba-c). The Nei’s genetic identity ranged 0.91–0.96 for RAPD + ISSR + DAMD, 0.94–0.97 for SCoT + CBDP and 0.92–0.96 for SSR. The genetic identity between Hybrid (H) and Selection (S) populations was maximum for all the three methods (SSR—0.96, SCoT + CBDP—0.97, RAPD + ISSR + DAMD—0.96) having the closest genetic relationship. In SCoT + CBDP based dendrogram, the hybrid population (H) seemed to be closely related to the selection population (S) (0.97), while the local population ‘L’ [L vs S (0.94) / H (0.94)] was distinct from the two populations. The same pattern of clustering was also manifested with SSR but the identity values were more pronounced (L vs S—0.92 / H—0.92). For RAPD + ISSR + DAMD markers, ‘H’ depicted maximum genetic identity with both populations ‘S’ and ‘L’ (0.96). The NJ dendrogram depicted that ‘H’ and ‘S’ populations were grouped together, while the farthest genetic identity was 0.91 between ‘L’ and ‘S’ populations.

Discussion

The maintenance of genetic variation is a key objective for conservation41. Focal to guiding strategies for conservation and sustainable exploitation of crops for genetic improvement is the determination of genetic diversity within and among populations. Based on the genetic variability parameters viz. Na, Ne, h, I, PB%, Ht, Hs, GST and Nm results of the current study on various marker systems differed according to mango populations based on geographic origin and fruit status. The genetic diversity indices like percent polymorphism, observed number of alleles, effective number of alleles, Nei’s genetic diversity, Shannon's index reflects diversity and differentiation among the germplasm collections and reveal genetic diversity within and between the populations. The higher the indices, greater is the genetic diversity.

Among the mango germplasm collected from four different geographical regions, the highest genetic diversity was observed in East Indian population (EI), followed by that of South India (SI) > West India (WI) > North India (NI), suggesting that mango genotypes from the eastern and southern parts of India possess relatively higher genetic variation and has a greater capability to acclimatize and evolve than the other populations. The observed number of alleles were higher than that effective in all populations. Similar kind of observations were reported in other tree species like Apricot42 and Zanthoxylum Spp.15.The primary reasons for differences in genetic variation can be attributed to climate conditions43, topography44, mating system and seed dispersal method45. Genetic diversity is often positively associated with population size; larger populations generally hold proportionately higher levels of genetic diversity. While studying different apricot cultivars of North China, Li et al.46 also emphasized that the genetic diversity was affected by the population size and a larger sample size would generate more accurate results. In our case, the reasons for the high level of genetic diversity observed in EI may be attributed to potential genetic status, eco-geographical factors, advantageous tropical growing conditions, wide geographic distribution and population size. Interestingly, most of the East Indian genotypes are indigenous hence as expected, the local genotypes generally showed more diversity than hybrid or selection groups genotypes in population analysis based on fruit status. This may be due to the presence of unique alleles present in these indigenous populations, which have been lost during prolonged cultivation over the years or long-term adoption, etc. in case of hybrid and selection populations. The presence of ‘private alleles’ in different populations represent a unique source of genetic diversity in the studied Indian mango germplasm. East Indian and South Indian population had the highest numbers of these private alleles, as detected using all the six marker systems, i.e. RAPD (16, 6), ISSR (12, 3), SCoT (3, 5), DAMD (8, 6), CBDP (8, 2) and SSR (8, 4). The private alleles can be used as molecular signatures (ID marks) in fingerprinting studies and such data could be used in assessment of genetic purity of the populations. Most of the local populations encompass indigenous genotypes of Odisha State and these results demonstrate the potential of local germplasm for exploiting the unique and favourable alleles present therein for mango germplasm improvement programmes.Hence, such information will aid the selection of cultivars for germplasm conservation and implementation in modern-day mango breeding, by providing information on diverse genetic backgrounds in native and local genotypes.

The existing variations in the nature of genotypes or group of genotypes can be identified using a specific statistical method or combination of methods47. The selection of a particular type of molecular marker is important and critically depends on the intended use48. Sivaprakash et al.49 suggested that the ability of a marker system to resolve genetic variation may be directly related to the degree of polymorphism. Polymorphism in a given population is often due to the existence of genetic variants represented by the number of alleles at a locus and their frequency of distribution in a population. Comparison of RAPD, ISSR, DAMD, SSR, SCoT and CBDP analysis in respect of diversity parameters showed a substantial level of genetic variation and broad genetic base in studied mango populations. Genetic variability among Indian mangoes selected for geographical and fruit status population analysis in the present study, as estimated by parameters such as Na, Ne, h, I, PB%, Ht and Hs, were found to be the highest with SSRs, followed by gene targeted markers like CBDPs and SCoTs, thus reflecting the preeminence of the former one at levels of genetic The polymorphic band percentage (PB%) was also established to be higher in both geographical and fruit status populations for RAPD (68.03, 80.48) compared to ISSR (66.33, 77.78) and DAMD (65.69, 74.51). The reason for higher genetic diversity revealed by SSR markers may be attributable to their multi-allelic nature, high polymorphism and information content as well as unique mechanism responsible for generating SSR allelic diversity by replication slippage. Nei’s genetic diversity, Shannon’s index, means values of effective number of alleles, observed number of allele and Percent polymorphism were also reported high in trees like Zanthoxylum spp.15, Punica granatum L.50 and Lagenaria siceraria28 using different markers.

The high levels of polymorphism obtained with different markers for geographical (> 65%) and fruit status (> 74%) populations clearly proved their usefulness in the genetic variability studies on Indian mango populations. Variation in the polymorphism in marker systems detected by different primers may be attributed to the fact that their specificity and efficiency are governed by different nucleotide sequence. The gene targeted and microsatellite markers generated abundant polymorphism, thus they could be applied to identify the plant materials having close relationships in different populations. In addition to polymorphism detection, their higher scores of Na, Ne, H, I, PB%, Ht and Hs than other studied marker systems, were perhaps because of their derivation from genic regions of the genome.

The genetic differentiation of a species reflects the interactions of various evolutionary processes including long-term evolutionary history, such as shifts in distribution, habitat fragmentation and population isolation, mutation, genetic drift, mating system, gene flow and natural selection51. Various parameters viz. geographical isolation, population fragmentation, breeding system and genetic drifts may be responsible for the high population differentiation52. The coefficient of gene differentiation (Gst) indicates the level of differentiation according to the scale, low (0.00–0.05), moderate (0.06–0.15), high (0.16–0.25) or very high (> 0.25)53. Gst derived gene flow (Nm) is another way to ascertain genetic variability to a population which can affect the genetic structure too. It allows members of one gene pool to mate with members of another gene pool, leading to a shift of the allele frequencies and decrease in the degree of population differentiation54. Consequently, more evident the gene flow, the lower is the degree of genetic differentiation. Estimate of genetic flow (Nm) has been classified as low (Nm < 1), moderate (Nm > 1) and extensive (Nm > 4) in population genetics55. Nm values greater than one is strong enough to prevent substantial differentiation due to genetic drift56. Geographic origin based populations were moderately structured by all the six marker systems, individually and in combination, due to Gst value of 0.10–0.15 and an overall Gst-derived estimate of gene flow of 2.71- 4.49. The moderate to extensive gene flow among the populations signifies a substantial extent of gene exchange between various populations. Mean coefficient of gene differentiation (Gst) value 0.147 indicated that about 15% genetic diversity resided among the species by arbitrary markers (RAPD + ISSR + DAMD), which is higher than the diversity informed by gene targeted markers (Gst—0.101). Gene flow estimate by gene targeted markers was found to be high (4.432) and it was lower than the gene flow estimate of arbitrary markers (2.907). These Gst and Nm were also varied for RAPD (0.140, 3.071), ISSR (0.104, 4.298), SCoT (0.10, 4.494) and CBDP (0.102, 4.402) markers, respectively. Such differentiation is in contrast to observations in cowpea (Gst = 0.409;57. The Gst and Nm ranges obtained in the present study using different marker types were comparable to the values (Gst : 0.087–0.180, Nm : 2.278–5.240) recorded for 46 different accessions of a wild plant, Amygdalusmira examined using SSR, ISSR and SSR + ISSR markers58.

Investigation of an alternative population exploration based on fruit status unfolded a low-to-moderate extent of genetic differentiation and high gene flow (Gst—0.053–0.115, Nm—3.82–8.90). Gst value was highest in RAPD (0.115), followed by SSR (0.07), DAMD (0.065), ISSR (0.061), CAAT (0.054) and SCoT (0.053), which indicate the highest ability of the RAPD markers to evaluate the variation among local, hybrid and selection genotypes, whilst the trend for gene flow (Nm) among the populations was reversed, being the maximum for SCoT (8.91) followed by CBDP (8.71), ISSR (7.65), DAMD (7.19), SSR (6.56) and RAPD (3.82). Low levels of Gst and high Nm were also obtained for bottle gourd28 similar to the present study. SSR and DAMD markers reflected slightly more differentiating ability depending on geography, so did the RAPDs for local, hybrid and selection populations.

These results were further supported by the AMOVA analysis which revealed that maximum (> 87%) of the total genetic diversity by all six marker systems was distributed within populations, whereas a less fraction (3–13%) of the diversity was attributed to differences among populations. Similar patterns of more genetic variation within populations were demonstrated in other perennial species; for instance, Fig. 59, strawberry60, mulberry61 and Koehne58, using different marker systems. SSR and DAMD was partitioning maximum (13% and 12%) among geographic populations, whereas RAPD was partitioning maximum variation of 11% among local, hybrid and selection populations. Similar to our results, was AMOVA analysis of 19 varieties of Ficuscarica49, which showed SSR to represent 9.82% group variation, higher compared to RAPD (0.71%), ISSR (6.69%) and RAPD + ISSR (2.59%). The overall ΦPT values confirming geographic distribution had little effect on genetic diversity of mango genotypes. This study clearly indicates frequent exchange of gene pool among the populations perhaps mediated by rampant cross-pollination.

The levels of genetic diversity and the distribution of variability within and between plant populations have generally been interpreted as the result of the balanced combination of the reproductive system and the history of the species62. Perennial and allogamous species typically exhibit higher levels of genetic diversity within, rather than between, in contrast to inbreeding or selfing annual species populations, thereby indicating the influence of biological characteristics on these parameters63. Previous reports also revealed that self-pollinating species have relatively less within-population genetic variation than out-crossing species64. Nm may be greater in outcrossing species because the pollen dispersal mechanisms which are much more developed in perennial out-crossers than in inbreds or partially self-pollinated herbaceous plants. Furthermore, the geographic distribution of a species is also highly dependent on the type of pollinators. The potential factors like widespread nature, pollination by insects may have contributed to this scattered pattern of genetic divergence leading to less population differentiation, high gene flow, and frequent gene exchange. The results of the present study suggested that more genetic variation of the species could be captured when sampling a larger number of plants from each population or geographic region. Previous studies also demonstrated that gene mutation, gene flow, population size, sampling strategy can influence genetic variation65.

Nei’s genetic distances and identity were estimated for determining the relationships among mangoes from the four geographic populations i.e. East India (EI), West India (WI), North India (NI) and South India (SI) and a dendrogram was generated. Results revealed minimum values of Nei’s genetic diversity (h) distances between EI and SI populations which were 0.07, 0.05, 0.06 for cumulative gene targeted, arbitrary and SSR markers, respectively. These two populations are neighbours to each other and hence geographical closeness might have played a role in the clustering of the provenances. Being so, both the regions share similar climatic conditions i.e. tropical and sub-tropical. However another distinctive factor contributing to such proximity in genetic diversity might be due to the latitude, longitude and altitude of the respective regions. These above mentioned factors are found to be responsible in studies with other plant species (Xie et al.43, Yan et al.44). Compared to EI and SI, varying climatic conditions (ranges from tropical wet and dry to semi-arid) might have resulted in distinctive genetic variability in WI population in respect of all DNA marker systems employed. The same pattern of clustering among geographical populations was also manifested with SSR, cumulative RAPD + ISSR + DAMD and SCoT + CBDP markers. In the present study the genetic variability is found to be congruent with the geographical diversity. Likewise, NJ dendrogram was prepared among three fruit status populations. The NJ clustering revealed more or less similar grouping pattern for all the three methods. While Hybrid (H) and Selection (S) populations revealed closest genetic relationship, the local (L) population was distinctive.

These results were supported by population analysis of cumulative arbitrary markers, of which RAPD revealed the most genetic differentiation (Gst—0.115, less gene flow (Nm—3.821) and maximum variations among population (11%) in respect of determining fruit status. Aros et al.66 established UPGMA cluster analysis among wild species and cultivated varieties of alstroemeria (Peruvian lily) through RAPD markers, comparable to our study. Discordance between dendrograms obtained by diverse marker types could be explained by different nature of each markers (genetically inert and functionally active), different regions coverage of the genome targeted by different marker techniques, extent of polymorphism perceived and the number of loci67,68. High bootstrap values at most of the nodes supported the stability of these dendrograms. Co-phenetic analysis was also done to evaluate the ‘goodness of fit’ of the resulting phylogenetic trees. Data based on RAPD + ISSR + DAMD, SCoT + CBDP and ISSR markers revealed the consistency of the inferred relationships as the co-phenetic correlation value obtained were ≥ 0.7 for all marker systems and indicated a good fit.

In the present investigation, comparisons were made between use of combinational markers, such as arbitrary dominant (RAPD + ISSR + DAMD) and gene targeted (SCoT + CBDP) type, for testing the combined ability of the markers for genetic diversity assessment among mango populations of India. Cumulative multi-marker analyses have been reported to be more informative than individual ones applied to horticultural crops like papaya69 and jatropha70. Our studies have shown that SCoT + CBDP markers were more effective than RAPD + ISSR + DAMD markers in estimating diversity among geographical and fruit status populations. A comparative account of the gene targeted and neutral markers in detecting genetic variations in Morinda tomentosa was studied by Arya et al.61, who reported that gene targeted markers were more useful than random markers in detecting polymorphism, similar to our results. It is, therefore, advisable to employ combined and multiple marker analysis (distinct and analogous) while performing analytical assessment of genetic diversity and relationship among populations, because such markers target to different genomic fractions, thus providing complementary information and offering more accurate and conclusive results71.

Results of our investigation demonstrate that while within population the genetic diversity was high, among populations it was relatively low in magnitude. The level of genetic variation is influenced by an array of determinants, such as gene flow, geographic conditions and genetic drift. Geographic isolation may lead to a loss of genetic diversity due to reduced gene flow and genetic drift. In absence of timely steps as protective measures, the intra-population genetic diversity might decline due to inbreeding influence. Hence, genetic rescue of promising local mango landraces via in situ and ex situ conservation measures would be highly desirable12. The present study using multiple DNA marker-based diagnostic profiles of mangoes provide the means of rapid characterization of established as well as local genotypes within the populations and, thus, enable the selection of appropriate genotypes for conservation, sustainable management and commercial exploration, such as profitable utilization in bio-prospection programs and also as parents for breeding aimed at genetic improvement of this important edible plant genetic resource.

Conclusions

In essence, the present study demonstrates the usefulness of DNA marker systems for elucidation of the genetic variability within and among Indian mango populations based on their geographic origin or status of the fruit.

The maximum genetic heterogeneity perceived for East Indian populations of mango genotypes, including local elites of the State of Odisha, indicates that they hold the highest proficiency to adapt and evolve compared to other geographical populations. SSR and gene targeted markers were more efficient than arbitrary markers for evaluating genetic diversity of different populations. SSR and DAMD markers reflected slightly more differentiating ability among geographic populations, so did the RAPDs for local, hybrid and selection populations.The high levels of polymorphism obtained with all marker systems used in the present study for population diversity analysis clearly proved their usefulness in estimating the genetic variability among Indian mango germplasm. Combined marker approach was more informative than use of individual markers in population diversity analysis. SCoT + CBDP markers proved to be more effective than RAPD + ISSR + DAMD markers with respect to parameters for evaluating diversity among populations grouped based on geographical locations or fruit status. Understanding the genetic diversity among and within populations of Indian mangoes will be a useful resource in respect of future germplasm conservation, maintenance, exploitation and breeding success, aimed at their genetic improvement.

Materials and methods

Plant material

In the present investigation, we have used 70 promising Indian mango genotypes (Table 1), encompassing 26 selections (commercially grown mango genotypes), 23 hybrids (developed through breeding at different institutes) and 21 local germplasms (landraces), representing different eco-geographical locations of India and are maintained in field gene banks.

Genomic DNA extraction and quantification

Fresh and preserved leaves of three different ages (tender, young and mature) of 70 mango genotypes were used for DNA extraction. Extraction of total genomic DNA was carried out as described by Doyle and Doyle72,73 with minor modifications8. A sharp glowing band was observed indicating the presence of good quality genomic DNA. By comparing the fluorescent intensity of the bands with the standard (λ DNA/ EcoRI digest, Bangalore Genei Pvt. Ltd., India), DNA concentration was estimated following the method described by Sambrook et al.74. Parts of stock DNA samples were diluted with appropriate amount of TE buffer to yield a working concentration of 29–40 ng/µl for downstream marker analysis. DNA samples were stored in a refrigerator (4 °C).

PCR amplification

Genotyping was performed using a total of 278 primers that included individual and combination of arbitrary (80 RAPD, 53 ISSR, 41 DAMD), gene-targeted (48 SCoT, 56 CBDP) and co-dominant sequence specific (72 SSR) markers using the protocol described by Jena et al.8. Only 208 primers out of the 40 primers screened, resulted in discrete profiles consisting of polymorphic reproducible fragments were validated for further analysis (Table 2).The RAPD primers include OPA, OPC, OPD (Operon Technologies, Alameda, California, USA) and RPI-C Series (Bangalore Genei, India). ISSR primers (University of British Columbia, Canada). DAMD primers were selected basing on the minisatellite core sequences in rice75,76,77, humans78,79, fungi80 and phage M13;SCoT primers (designed by Collard et al.81; Luo et al.82). A total of 56 CBDP primers (50 designed and 6 from Singh et al. 83) and microsatellite primers (SSR) selected from earlier reports84,85,86,87 were synthesized by Bangalore Genei, India. Detailed Primer sequence information of all markers are documented (Jena8). All amplified products (RAPD, ISSR, DAMD, SCoT and CBDP) were loaded in wells and resolved on 1.5% agarose gel in 1 × TBE buffer by electrophoresis. For SSR, amplification products were resolved on 3% agarose gels. The amplified fragments were photographed using gel documentation system (Bio-Rad, USA) and stored as digital images using the built-in software.

Molecular data analysis

All amplifications were repeated thrice and only reproducible and consistent bands were considered for analysis. The distinct amplicons were scored visually as discrete variables using 1 for presence and 0 for absence separately for each marker and a binary matrix was obtained. Binary marker data were used for the purpose of revealing association of molecular markers with pomometric traits.

Analysis of population genetic variability and AMOVA

To evaluate variability among and within populations, 70 different mango genotypes were classified based on:

  1. a)

    Geographic origin [(i) East India (EI, 29 genotypes), (ii) West India (WI, 13 genotypes) (iii) North India (NI, 13 genotypes), and (iv) South India (SI, 15 genotypes)]

  2. b)

    Fruit status [(i) Selection (S, 26 genotypes), (ii) Hybrid (H, 23 genotypes) and (iii) Local (L, 21 genotypes)]

The basic parameters for genetic diversity such as observed number of alleles (Na), effective number of alleles (Ne), Nei's genetic diversity (H), Shannon's information index (I), polymorphic bands percentage (PB%), total genetic diversity (Ht), genetic diversity within population (Hs), coefficient of genetic differentiation (Gst) and gene flow (Nm) were calculated using POPGENE software version 1.3188. Genetic diversity within and among populations was estimated by the method of analysis of molecular variance (AMOVA) using GenAlEx 6.502 software89 for studying molecular variations at population level. The significance of the results was tested using 9999 random permutations of the data.

Similarity-based clustering and construction of phylogenetic tree

Pairwise-similarity matrices were generated by calculating Jaccard's similarity coefficient90 among all possible pairs to accomplish genetic similarity between the genotypes with the SIMQUAL (Similarity of Qualitative Data) option of Numerical Taxonomy and Multivariate Analysis System, NTSYS-pc software version 2.0291. These similarity matrices were then run on Sequential, Agglomerative, Hierarchical and Nested (SAHN) clustering method and subjected for construction of dendrogram by the unweighted pair group method with arithmetic average (UPGMA)92 clustering algorithm with NTSYS-pc. To evaluate the relationship among different populations of Indian mangoes a neighbour joining (NJ) dendrogram was constructed based on Nei's genetic distance93. To estimate the robustness and validity of dendrogram typology and clustering bootstrap analyses were performed of 1000 bootstrap samples using the software WINBOOT94.