Abstract
Genomic selection is a promising breeding technique for tree crops to accelerate the development of new cultivars. However, factors such as genetic structure can create spurious associations between genotype and phenotype due to the shared history between populations with different trait values. Genetic structure can therefore reduce the accuracy of the genotype to phenotype map, a fundamental requirement of genomic selection models. Here, we employed 272 single nucleotide polymorphisms from 208 Mangifera indica accessions to explore whether the genetic structure of the Australian mango gene pool explained variation in trunk circumference, fruit blush colour and intensity. Multiple population genetic analyses indicate the presence of four genetic clusters and show that the most genetically differentiated cluster contains accessions imported from Southeast Asia (mainly those from Thailand). We find that genetic structure was strongly associated with three traits: trunk circumference, fruit blush colour and intensity in M. indica. This suggests that the history of these accessions could drive spurious associations between loci and key mango phenotypes in the Australian mango gene pool. Incorporating such genetic structure in associations between genotype and phenotype can improve the accuracy of genomic selection, which can assist the future development of new cultivars.
Similar content being viewed by others
Introduction
Horticultural tree crops are vital for sustainable food production1 and ornamental and industrial use. Tree crops can be more sustainably cultivated over time than annual field crops, thus helping to manage food supply for an increasing world population2. To create new tree fruit cultivars with improved productivity and quality, we must develop breeding technologies that overcome biological limitations to their production. Tropical species, such as mango, are often large and vigorous3, leading to canopies that rapidly outgrow their orchard space. This generates shade, providing a breeding ground for disease4. To avoid the adverse effects of tree size, trees are traditionally planted at low density and heavily pruned each year4, leading to a reduction in overall production per hectare and an increased cost per unit output. Consequently, a quest to breed smaller, less vigorous trees while maintaining high yields of quality fruit is underway5,6. Such efforts will produce mango that can be grown in intensive, high-density orchards that produce more fruit per hectare7.
Traditional tree breeding is slow, as evaluations require an assessment of phenotypic performance in mature trees over many years to account for the effects of variable spatial and temporal environments on phenotypic diversity. These evaluations, in combination with a long juvenile phase (typically 2–4 years4), can result in a selection process of up to or longer than 10 years from field planting8, making the rapid development of new cultivars unfeasible. The time for cultivar development could be reduced by predicting future phenotypic performance in young individuals using genomic selection, as demonstrated in apples9, sweet cherry10 and strawberry11. Genomic selection uses genotype to phenotype maps from a training population to predict phenotypic variation in untested populations using marker data12,13. Thus, once a genomic selection model has been created, the length and expense of phenotyping key traits may be reduced. Genomic selection for tree size and vigour of progeny could therefore improve the breeding process and reduce the cost of mango breeding compared to traditional breeding approaches.
The primary assumption of genomic selection is that genetic markers are closely linked on a chromosome with the causative loci that contribute to the trait of interest14. In general, the closer the marker is to the causative loci, the more accurate the genotype to phenotype map. However, genetic structure can create statistical associations between loci that are not physically linked. This occurs because evolutionary forces such as migration, drift and mutation can make allelic combinations between unlinked loci more common than expected by chance15. Genetic structure can therefore create spurious associations between genetic markers and traits. Furthermore, genetic structure is often prevalent in modern crops, particularly those moving across the world via human migrations, which likely experienced drastic fluctuations in population size and suffered from inbreeding after crossing genetically related individuals with favourable traits16.
Differentiating uninformative loci due to genetic structure from those linked to causative loci is a common problem observed in genetic studies of human disease17,18 and the study of trait evolution across diverse taxa19,20,21,23. Fortunately, we can improve the accuracy of the genotype to phenotype map by accounting for genetic covariation between traits and markers due to genetic structure24–26, a practice that can potentially improve the quality of horticultural breeding programs that start from highly variable germplasm collections. Here, we evaluate the assumption that horticultural trait variation segregates independently from genetic structure using Mangifera indica in the gene pool of the Australian Mango Breeding Program.
Mango is a major horticultural tree crop worldwide, yet an understanding of the domestication history is still debated. The centre of origin of the genus Mangifera is Southeast Asia, but the origin of the species M. indica is still under question. Based on the fossil record, Mukherjee27 and Blume28 suggested that mango originated in the Malay Archipelago less than 2.58 million years ago. However, recent molecular taxonomy suggests it evolved within a large area of Northwest Myanmar, Bangladesh and Northeast India29. From this area, human migration and trading led to the dispersion of mangoes to many regions of the world30.
Several studies have evaluated the genetic structure of domesticated mango31,32,33,34,35,36,38. Yet, to our knowledge, there have been no published studies on the effects of genetic structure on phenotypic variation in mango accessions. One study with 60 mango accessions from India accounted for genetic structure in a marker trait analysis35, however, Lal et al.35 did not assess the effect of genetic structure on their genotype to phenotype map. Without understanding the effect of genetic structure on phenotypic diversity, we do not know whether we are creating false associations between genetic markers and key mango traits. Here, we directly examined the effects of genetic structure on the creation of spurious associations between genetic markers and three traits – trunk circumference (a proxy for tree size), fruit blush colour and intensity – in the Australian mango gene pool. We assessed 272 SNP markers genotyped in 208 M. indica accessions imported worldwide and revealed statistical associations between genetic markers and traits arising from genetic structure. These results will help guide future studies incorporating genetic structure into their genomic selection models.
Results
Genetic structure in the Australian mango gene pool
Genetic structure was found in both a hierarchical cluster analysis (HCA) and a principal component analysis (PCA) across all 208 M. indica accessions (Fig. 1). Consistent with a recent origin of all accessions, the HCA created a dendrogram with only short branches in the centre (Fig. 1a), indicating few genetic differences separate the clusters. The optimal number of genetic clusters was K = 4, as indicated by the HCA and the elbow plot. The elbow plot from the HCA shows diminishing returns in the amount of variance explained after five clusters (Fig. S1). In the dendrogram, cluster 1 is the most genetically differentiated cluster, which only contains accessions imported from Southeast Asia. Cluster 1 is most distinct from clusters 2 and 3. In contrast, cluster 4 is more similar to cluster 1 (Fig. 1a) and contains a mixture of samples across geographical regions (e.g., South Asia, Southeast Asia, Americas, and Oceania; Table 1; Fig. 2). In the reduced principal component (PC) space (Fig. 1b), genetic clusters largely overlap, with South Asian accessions (mostly Indian accessions) primarily concentrated in the centre of the multivariate space. Genetic clusters from Southeast Asia, the Americas, and Oceania occur towards the edges of the genotypic space, with Southeast Asia distinctly separated in the PC1 axis.
In agreement with the HCA and PCA results above, we identified genetic clusters across the 208 M. indica accessions (Fig. 3) using the Bayesian clustering approach implemented in STRUCTURE39. Most accessions contained large amounts of admixture or shared ancestral polymorphism, where portions of their genome were assigned to different genetic groups. When genetic differentiation was separated into only two groups (K = 2, see Methods), Southeast Asia formed one group, while all other accessions were in a second group (Fig. 3). Relaxing this constraint to K = 3 revealed the Americas and Oceania accessions each form a group. Populations are almost indistinguishable when K is larger than 4. Consistent with the elbow plot discussed above, the Evanno method40 and the log probability of K values show that K = 4 was the optimal number of clusters (Fig. S2). Most accessions show signatures of admixture as indicated by diversity from multiple groups. Admixture signals are particularly pronounced in accessions from South Asia, mainly those from India, which do not form a distinct genetic group with any K-value.
Together, the HCA, PCA and STRUCTURE results suggest that mango accessions of the Australian mango gene pool consist of four genetic groups. Southeast Asian accessions are most differentiated relative to the rest of the world, suggesting that these accessions might have evolved differently, thus creating a heterogenous gene pool for cultivar creation in the Australian Mango Breeding Program.
Patterns of genetic diversity across the Australian mango gene pool
Genetic diversity analyses revealed high levels of heterozygosity and variable patterns of inbreeding across regions (Table 2). Levels of expected heterozygosity (HE) and observed heterozygosity (HO) were high across the world, with the Americas having the highest levels of observed heterozygosity (HO = 0.49) and Southeast Asia having the lowest (HO = 0.39). Accessions from the Americas contain an excess of heterozygote individuals (i.e., a negative inbreeding coefficient; FIS = − 0.11; 95% CI − 0.13 to − 0.08). On the other hand, accessions from Southeast Asia are mildly inbred (i.e., a positive inbreeding coefficient; FIS = 0.08; 95% CI 0.06 to 0.11). Private alleles were absent in all regions, indicating either a large intermixing population or the presence of ancestral polymorphisms that have not been sorted across geography.
Genetic differentiation comparisons showed variable patterns of FST between genetic clusters and between regions of import. Comparisons between regions have low levels of FST, which range from − 0.016 to 0.112 (Table 3a). Southeast Asia and the Middle East, closely followed by the comparison between Southeast Asia and the Americas, showed the highest level of genetic differentiation (FST = 0.112 and 0.107, respectively). In contrast, FST between clusters ranged from 0.051 to 0.286, with cluster 1 comparisons having the highest values (Table 3b). Overall, there is low genetic divergence amongst regions of the Australian mango gene pool and high genetic divergence between genetic clusters.
Genetic structure and region of import influence phenotypic diversity
Phenotypic correlation analyses revealed associations between fruit blush colour and intensity but not between them and trunk circumference. Trunk circumference, a continuous trait, was highly variable at 9 years, ranging from 27 to 70 cm, while categorical fruit traits were less variable (see Fig. S3 for photos of each fruit blush colour and intensity category). In a single-factor linear model, fruit blush colour and intensity were strongly correlated (LR χ2 = 373.168, df = 4, p < 0.0001, R2 = 0.61). However, given that 39% of mango accessions lacked fruit blush colour and therefore lacked fruit blush intensity, we removed ‘no blush’ and retested the association. It led to a significant yet weaker association between the fruit traits (LR χ2 = 95.077, df = 3, p < 0.0001, R2 = 0.28), indicating the importance of no blush in our understanding of the genetics of blush in mango. We found no correlation between trunk circumference and fruit blush colour (Fig. S4; F4,203 = 1.093, p = 0.3613, R2 = 0.02) and trunk circumference and fruit blush intensity (Fig. S5; F4,203 = 1.473, p = 0.2118, R2 = 0.03), suggesting trunk circumference is likely to be genetically independent of these fruit traits.
Fruit blush traits are strongly associated with the region of import in the Australian mango gene pool. In single trait linear models, region of import showed a significant effect on fruit blush colour (Fig. 4a; LR χ2 = 77.768, df = 12, p < 0.0001, R2 = 0.14) and fruit blush intensity (Fig. 4b; LR χ2 = 98.936, df = 3, p < 0.0001, R2 = 0.18), but not trunk circumference (F3,188 = 1.970, p = 0.1200, R2 = 0.03). Of the regions that had more than ten samples, trunk circumference ranged from a mean of 48.1 ± 1.8 (n = 38) in South Asia to a mean of 52.5 ± 1.3 (n = 46) in the Americas (Table S1). For fruit blush colour (Table S2), 67% of accessions from Southeast Asia had no blush colour, while only 11% from the Americas had no blush, with most having red blush (43%). For fruit blush intensity (Table S3), the Americas had 41% of accessions with a medium blush intensity that resembled the Haden accession. In comparison, Oceania had 39% of accessions with slight blush intensity resembling the Kensington Pride accession. Contrastingly, 94% of Southeast Asian accessions and 82% of South Asian accessions had no blush or barely visible blush intensity.
Fruit blush colour, intensity and trunk circumference were all associated with the four clusters assigned in the HCA. Cluster assignment had a significant effect on fruit blush colour (LR χ2 = 47.074, df = 12, p < 0.0001, R2 = 0.08) and the presence of blush (LR χ2 = 28.046, df = 3, p < 0.0001, R2 = 0.10), where 18% of individuals in cluster 1 had blush, whereas 70% and 69% of individuals from clusters 2 and 3 had blush, respectively. Cluster 1 is more likely to have lower blush intensity than the other clusters when the ‘no blush’ category is excluded (LR χ2 = 12.274, df = 3, p = 0.0065, R2 = 0.04; odds ratios between cluster 1 and clusters 2 to 4 ranged from 3.8 to 10.5). Finally, cluster had a significant effect on trunk circumference (F3,204 = 18.410, p < 0.0001, R2 = 0.21), where cluster 1 (mean = 52.3 ± 1.5, n = 28) and cluster 2 (mean = 53.7 ± 0.8, n = 102) had the largest trunk circumference and cluster 4 had the smallest (mean = 36.8 ± 2.9, n = 11). Overall, we expect that genetic diversity and factors specific to the region of import will likely influence the genotype to phenotype map of these key mango traits.
Discussion
Genetic structure arises from evolutionary processes such as mutation, migration and genetic drift, which drive shifts in allelic frequency that could cause statistical associations between random genetic markers and traits41. Such variation arising from genetic structure is often confounded with loci contributing to trait variation in association studies17,18,19,20,21,23, which can misrepresent the genotype to phenotype map assumed in genomic selection models. Our study shows how genetic structure in M. indica can lead to statistical associations between genetic markers and three phenotypic traits measured in this study – trunk circumference, fruit blush colour and intensity. This suggests that the genetic architecture of these horticultural traits contains noise arising from the conflation of phenotypic and historical differences in the Australian mango gene pool. Such noise can create spurious associations that hinder the selection of new cultivars, so we recommend that future studies in mango breeding take this into consideration.
Genetic variability and divergence in the Australian mango gene pool can be understood in two ways. On the one hand, accessions imported from different regions are weakly differentiated. On the other hand, genetic clusters are strongly differentiated, implying the existence of clear genetic groups. Results described in Fig. 2 reveal that genetic clusters are distributed across regions, implying that their genetic structure is shared across the world. The net effect of this nested relationship between geographic region and genetic cluster is low FST values amongst the regions yet high levels of FST amongst genetic clusters. This relationship can be used to hypothesise the causes of genetic divergence in the Australian mango gene pool.
In our study, cluster 1 (containing only Southeast Asian countries) comprises the most genetically differentiated accessions from across the world. Previous studies support this observation; Warschefsky and von Wettberg31 showed that accessions from Southeast Asia cluster together in a STRUCTURE plot, and Dillon et al.32 reached a similar conclusion using genetic distance analyses of 254 mango accessions. Surprisingly, we did not find private alleles (exclusive alleles) to Southeast Asia, such as those found in Warschefsky and von Wettberg31 (74 private alleles from a total of 364 SNPs; 20%). It is unclear what is driving this difference in the number of private alleles between the two studies. However, some of the factors that could be contributing to the variation in sampled loci include different cultivars, contrasting sequencing techniques (SNP chip vs. Restriction site associated DNA markers), different approaches for calling variants, and for filtering of the minor allele frequency42,43. The genetic differentiation observed between Southeast Asia and the rest of the world might have been driven by regional cultural differences. For example, in Southeast Asia, mangoes are incorporated into savoury dishes, which might have led to the selection of immature mangoes that stay green while ripening and therefore lack blush31. On the other hand, red blush is favoured around the world44, likely accentuating genetic differentiation between accessions from Southeast Asia and the rest of the world.
Artificial selection for these cultural preferences may have driven some of the genetic differentiation identified in the Australian mango gene pool. It is well accepted that selecting one trait can incidentally lead to the evolution of other traits through genetic linkage45,46. The genetic architecture of selected traits will largely determine the extent of this correlated evolution. In this study, we show that fruit blush colour and intensity are highly correlated, which might imply a shared genetic architecture. Therefore, selection for either of these traits could partially drive the evolution of the other. For example, the evolution of low blush intensity, but not trunk circumference, might have arisen from selection of low levels of blush colour in Southeast Asia. Selection of polygenic traits and recruitment of pleiotropic genes can also affect levels of genetic differentiation across the genome. Selection for trunk circumference, which is a polygenic trait47, might therefore drive changes in allelic frequencies across many loci. In contrast, fruit colour pigments and their levels, are often controlled by fewer loci in simpler biochemical pathways48–50. In general, we expect genes controlling plant growth and development51,52,53 to be important drivers of genetic differentiation between accessions and merit further attention considering the influence of the genetic architecture of selected traits on population structure.
Polyembryony could have contributed to the origin of genetic differences between Southeast Asia and other accessions. Southeast Asian accessions are typically polyembryonic, where all but one (the zygotic embryo) of the multiple somatic embryos are genetically identical to the maternal parent. Polyembryony is likely to easily be maintained under moderate to strong selection as it is thought to be inherited through a single dominant gene54,55. A high level of polyembryony can freeze the genetic diversity in a population, as instead of allowing hybridisation and creating unique individuals through recombination, it propagates genetically identical individuals56. Polyembryony can therefore create genetic bottlenecks if only a fraction of the original genetic diversity is propagated, consistent with the signature of inbreeding in Southeast Asian accessions we found in this study. Furthermore, previous studies have found genetic clustering of mango accessions according to their ability to produce polyembryonic seed57,58. However, embryo type is conflated with geographic region in these studies, where Southeast Asian accessions dominate the polyembryony types. Therefore, without future work teasing apart the contribution of polyembryony and geographic region than we lack an understanding of the various causes of polyembryonic selection and inbreeding on the genetic diversity of tree crops.
Genetic diversity and partitioning of genetic structure influence prediction accuracy in genomic selection models across horticultural crops16,24,26,59,60,61. For instance, increasing genetic diversity by using a variety of races or genetic clusters in the training and validation sets produced higher prediction accuracies in rice, sorghum16 and wheat61. But genetic diversity is known to reduce prediction accuracy when estimation error is high, which occurs in small populations or when there is low marker density61,62. By definition, using markers close to the causative variants will augment prediction accuracy during breeding; however, this is hard to achieve with low-density genotyping techniques such as SNP chips and Genotyping by Sequencing. With sequencing that covers the entire genome (e.g., whole genome sequencing), factors influenced by linkage disequilibrium can be better controlled, such as finding markers in tight linkage with causative loci. As such, population size, marker density, the genetic structure of the population, and the genetic architecture of the chosen traits will play a significant role in the accuracy of genomic selection models.
To ameliorate the adverse effects of genetic structure in genomic selection models, there are two major approaches used across horticultural crops16,24,26,59,61. The first approach includes principal components from genetic structure analyses as covariates in the model63,64,65,66. However, this method can double-count genetic structure because some elements are included in the model through the genomic relationship matrix67. Another common approach for accounting for genetic structure in genomic selection models is ensuring an equal contribution across genetic clusters in training and validation sets. This stratified sampling approach has been shown to increase prediction accuracy in sorghum16 and maize, and could be an effective method in the Australian mango gene pool. In general, choosing the most accurate genomic selection model will largely depend on the breeding population's genetic structure and the number of samples.
Conclusion
The results of this study reveal that a horticultural species spread across the world has a genetic structure that can create statistical associations between three key traits and genetic markers. To remove the effects of spurious markers, breeders should fully characterise the genetic structure of their breeding population. This will allow them to incorporate sample stratification to improve the performance of genomic selection models. Together with best practices of genomic selection (e.g., whole genome sequencing and large population size), these considerations can improve the genotype to phenotype map to assist in choosing individuals with accurate breeding values and help advance future parental selection. We hope our study encourages other horticultural breeding programs to follow similar methods.
Methods
Ethics statement
All plant material used in this research was sourced and collected from the Walkamin Research Station, Queensland (17.1341°S, 145.4271°E), where trees are held as a living collection. The Department of Agriculture and Fisheries granted permission as stated in the National Tree Genomics Program – Phenotype Prediction project (AS17000) for use and collection of materials from mango trees from their government station. This study complies with relevant institutional, national, and international guidelines and legislation.
Accessions
A total of 208 M. indica accessions were used from the gene pool collection of the Australian Mango Breeding Program at Walkamin Research Station. These accessions were imported from 21 countries across six geographical regions and were grafted onto the uniform polyembryonic rootstock, Kensington Pride. See Table 1 for the complete list of countries and sample sizes.
Genotyping
To identify some of the genotypic diversity in the Australian mango gene pool, we used the genotypes from Kuhn et al.68. DNA isolation for these genotypes was described in Kuhn et al.69. Briefly, young leaf samples were collected from Walkamin Research Station and the glasshouse at Mareeba Research Facility, Queensland (17.0075°S, 145.4295°E). DNA was extracted using 20 mg of fresh sample with the Qiagen Plant DNeasy kit. SNP genotyping was performed on these DNA samples using the Fluidigm EP-1 platform with 384 biallelic SNP markers. Finally, 272 SNP markers were selected for further analyses, where 236 markers belong to one of 20 linkage groups (7–20 markers per linkage group), and the location of the remaining 36 markers in the genome is unknown68. Genotypically identical individuals across the 272 SNPs were consolidated, leaving 208 mango accessions for the analyses. On average, 98% of the 272 SNPs used in this study were successfully genotyped in every accession.
Hierarchical cluster analysis
To examine the genotypic clustering of the mango accessions due to genotypic similarity, we performed a hierarchical cluster analysis (HCA) of the 208 M. indica accessions. First, pairwise genetic distances between all accessions were calculated using the percentage method by the “ape” v5.3 R-package70. The HCA was conducted by “stats” v3.6.2 R-package with complete linkage clustering. This computes all pairwise dissimilarities between the accessions in a cluster and accessions in another cluster and considers the largest value of these dissimilarities as a distance between the two clusters. To assess the optimal number of clusters, we used the elbow method71, which plots the total within-cluster sum of squares (WSS) against the number of clusters to show the ‘elbow’ where the WSS rate of decrease slows and indicates diminishing returns with more clusters72.
Principal components analysis
We assessed the major patterns of genetic similarity among the 208 mango accessions in multivariate space using a principal components analysis (PCA) with 272 SNPs. Missing SNP data were imputed using the regularised iterative PCA algorithm with the “missMDA” v1.17 R-package73. The PCA was performed using the “stats” v3.6.2 R-package74. Ellipses were constructed for each of the four clusters in the HCA to identify the position of every individual in a cluster in multivariate space with 95% probability.
Structure analysis
We determined levels of admixture between all 208 M. indica accessions with STRUCTURE v2.3.439. STRUCTURE is a Bayesian Markov chain Monte Carlo (MCMC) program that assigns individuals into genetic clusters (K) based on their genotypes by assuming Hardy Weinberg equilibrium within a cluster. It gives each accession an admixture coefficient to depict the proportion of the genome originating from a particular K cluster. We ran the admixture model and the correlated allele frequency model75 with ten independent runs of 100,000 burn-in and 100,000 MCMC iterations for K = 1 to K = 7. We visually inspected summary statistics of MCMC runs to ensure convergence of model parameters. Results were summarised and plotted in the “pophelper” v2.2.7 R-package76. The optimal K value (which represents the most likely number of sub-populations) was estimated by the Evanno method40, which uses the second-order rate of change in the log probability of data between successive K values in the R-package StructureSelector77. The optimal K value was also estimated using LnP(K), the mean log probability of the data. We also followed suggestions by Pritchard et al.78 and Lawson, et al.79 and plotted the lowest K values that capture the primary structure in the data.
Genetic diversity and genetic differentiation
To examine the level of differentiation between the clusters and geographical regions, Weir and Cockerham’s pairwise FST and 95% confidence intervals were estimated by “hierfstat” v0.4.22 R-package80. Each accession was assigned to a cluster based on the HCA, and each country of import was grouped into six geographic regions. We calculated 95% confidence intervals for each pairwise comparison using 1000 bootstrap replicates. Significance was determined by whether the confidence interval overlapped with 0.
Measures of genetic diversity were calculated for all 208 M. indica accessions for each of the six geographic regions. A genind object was created in “adegenet” v2.1.2 R-package81,82 for input into “hierfstat” v0.4.22 R-package80 to calculate observed heterozygosity (Ho), expected heterozygosity (HE) and the inbreeding coefficient (FIS). To determine whether FIS was significantly different from 0, we calculated 95% confidence intervals for each pairwise comparison using 1,000 bootstrap replicates. The number of private alleles (Pr) was calculated with the “poppr” v2.8.6 R-package83,84.
Phenotyping
To capture some of the phenotypic diversity in the Australian mango gene pool, we measured three traits in all 208 mango accessions – trunk circumference, fruit blush colour and fruit blush intensity. Trunk circumference was used as a proxy for tree size, as it has been found to be a strong indicator of tree size in other tree crops85,86,87. Trunk circumference was measured 10 cm above the graft when the trees were 9 years old at Walkamin Research Station. After maturity (> 5 years old), fruit blush colour and intensity were assessed once a year using ten ripe fruits from each mango accession for at least 2 years. Fruits were taken from the outside of the tree, where they are exposed to full sun and have well developed blush. Fruit blush included five categories: no blush, orange, pink, red and burgundy (Fig. S3a). Fruit blush intensity was recorded as five ordinal variables increasing in colour intensity (Fig. S3b), where the accessions in brackets best reflect the colour intensity: no blush, barely visible, slight (Kensington Pride), medium (Haden) and strong (Tommy Atkins).
The effect of region of import and genetic structure on phenotypic diversity
Tests of association were undertaken to examine the relationship between traits. Chi-square likelihood ratios were used to test phenotypic association amongst the categorical traits of fruit blush and intensity. We then performed the same analysis with the ‘no blush’ category removed to test whether the association remains. A linear model was performed to test for an association between trunk circumference and fruit blush colour, and also trunk circumference and fruit blush intensity.
To understand the effect of region of import on both genotype and phenotype in the Australian mango gene pool, we tested its association with genetic structure and phenotypic diversity. We investigated the influence of geographic region on phenotypic diversity for three key mango phenotypes – trunk circumference, fruit blush colour and intensity. We performed a likelihood-ratio chi-square test for fruit blush colour (categorical) and intensity (ordinal) against the region of import and a linear model for trunk circumference. Region of import was the explanatory variable in each model and included the regions shown in Table 1, excluding unknown regions (n = 7) and regions with low samples sizes, including the Middle East (n = 4), and Africa (n = 5).
We then tested for an effect of genetic structure on the three phenotypes using the optimal cluster assignment of K = 4 from the HCA. Likelihood-ratio chi-square tests were performed for whether cluster explained (1) fruit blush colour, and (2) the presence (n = 127) vs absence of blush (n = 81), irrespective of the intensity of blush. We then removed the individuals with no blush from the dataset to test whether there was a significant difference in fruit blush intensity between clusters for just the individuals with fruit blush using a likelihood-ratio chi-square test with an odds ratio. Finally, we performed a mixed linear model to test the effect of cluster on trunk circumference. JMP v15.2.0 (SAS 2015) produced all statistical results reported here.
Data availability
All data generated or analysed during this study are included in this published article (and its supplementary information files).
References
Smith, J. R. Tree Crops, A Permanent Agriculture. (Lulu.com, 2015).
Molnar, T. J., Kahn, P. C., Ford, T. M., Funk, C. J. & Funk, C. R. Tree crops, a permanent agriculture: concepts from the past for a sustainable future. Resources 2, 457–488. https://doi.org/10.3390/resources2040457 (2013).
Mizani, A. et al. in XI International Symposium on Integrating Canopy, Rootstock and Environmental Physiology in Orchard Systems 1228.1228 edn 167–174 (International Society for Horticultural Science (ISHS), Leuven, Belgium).
Bally, I. S. Mangifera indica (mango). Traditional Trees of Pacific Islands. Their Culture, Environment, and Use, 441–464 (2006).
Bally, I. S. E., Ping, L. & Johnson, P. R. Mango Breeding. In Breeding Plantation Tree Crops: Tropical Species (eds Mohan Jain, S. & Priyadarshan, P. M.) 51–82 (Springer New York, New York, NY, 2009). https://doi.org/10.1007/978-0-387-71201-7_2.
Bally, I. S. E. & Dillon, N. L. Mango (Mangifera indica L.) Breeding. In Advances in Plant Breeding Strategies: Fruits: Volume 3 (eds Al-Khayri, J. M. et al.) 811–896 (Springer International Publishing, Cham, 2018). https://doi.org/10.1007/978-3-319-91944-7_20.
Reddy, Y., Kurian, R. M., Ramachander, P., Singh, G. & Kohli, R. Long-term effects of rootstocks on growth and fruit yielding patterns of ‘Alphonso’ mango (Mangifera indica L.). Sci. Horticult. 97, 95–108. https://doi.org/10.1016/S0304-4238(02)00025-0 (2003).
Topp, B. L., Nock, C. J., Hardner, C. M., Alam, M. & O’Connor, K. M. Macadamia (Macadamia spp.) Breeding. In Advances in Plant Breeding Strategies: Nut and Beverage Crops: Volume 4 (eds Al-Khayri, J. M. et al.) 221–251 (Springer International Publishing, Cham, 2019). https://doi.org/10.1007/978-3-030-23112-5_7.
Kumar, S. et al. Genomic selection for fruit quality traits in apple (Malus×domestica Borkh). PLoS One 7, e36674. https://doi.org/10.1371/journal.pone.0036674 (2012).
Piaskowski, J. et al. Genomic heritability estimates in sweet cherry reveal non-additive genetic variance is relevant for industry-prioritized traits. BMC Genet. 19, 23. https://doi.org/10.1186/s12863-018-0609-8 (2018).
Gezan, S. A., Osorio, L. F., Verma, S. & Whitaker, V. M. An experimental validation of genomic selection in octoploid strawberry. Horticult. Res. 4, 16070. https://doi.org/10.1038/hortres.2016.70 (2017).
Desta, Z. A. & Ortiz, R. Genomic selection: genome-wide prediction in plant improvement. Trends Plant Sci. 19, 592–601. https://doi.org/10.1016/j.tplants.2014.05.006 (2014).
Heffner, E. L., Sorrells, M. E. & Jannink, J.-L. Genomic selection for crop improvement. Crop Sci. 49, 1–12. https://doi.org/10.2135/cropsci2008.08.0512 (2009).
Kijas, J. W. et al. Linkage disequilibrium over short physical distances measured in sheep using a high-density SNP chip. Animal Genet. 45(5), 754–757. https://doi.org/10.1111/age.12197 (2014).
Li, H. & Ralph, P. Local PCA shows how the effect of population structure differs along the genome. Genetics 211, 289. https://doi.org/10.1534/genetics.118.301747 (2019).
Sapkota, S. et al. Impact of sorghum racial structure and diversity on genomic prediction of grain yield components. Crop Sci. 60, 132–148. https://doi.org/10.1002/csc2.20060 (2020).
Berg, J. J. et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, e39725. https://doi.org/10.7554/eLife.39725 (2019).
Sohail, M. et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 8, e39702. https://doi.org/10.7554/eLife.39702 (2019).
Cardon, L. R. & Palmer, L. J. Population stratification and spurious allelic association. The Lancet 361, 598–604. https://doi.org/10.1016/S0140-6736(03)12520-2 (2003).
Flint-Garcia, S. A. et al. Maize association population: a high-resolution platform for quantitative trait locus dissection. Plant J. 44, 1054–1064. https://doi.org/10.1111/j.1365-313X.2005.02591.x (2005).
Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4. https://doi.org/10.1371/journal.pgen.0030004 (2007).
Power, R. A., Parkhill, J. & de Oliveira, T. Microbial genome-wide association studies: lessons from human GWAS. Nat. Rev. Genet. 18, 41–50. https://doi.org/10.1038/nrg.2016.132 (2017).
Quignon, P. et al. Canine population structure: assessment and impact of intra-breed stratification on SNP-based association studies. PLoS One 2, e1324. https://doi.org/10.1371/journal.pone.0001324 (2007).
Guo, Z. et al. The impact of population structure on genomic prediction in stratified populations. Theor. Appl. Genet. 127, 749–762. https://doi.org/10.1007/s00122-013-2255-x (2014).
Teo, Y. Y. Common statistical issues in genome-wide association studies: a review on power, data quality control, genotype calling and population structure. Curr. Opin. Lipidol. 19, 133–143. https://doi.org/10.1097/MOL.0b013e3282f5dd77 (2008).
Werner, C. R. et al. How population structure impacts genomic selection accuracy in cross-validation: implications for practical breeding. Front. Plant Sci. https://doi.org/10.3389/fpls.2020.592977 (2020).
Mukherjee, S. K. Origin of mango. Indian J. Genet. Plant Breed. 11, 49–56 (1951).
Blume, C. L., Vol. 1 (1850).
Bompard, J. M. Taxonomy and systematics. In The Mango: Botany, Production and Uses (ed. Litz, R. E.) 19–41 (CABI, Wallingford, 2009). https://doi.org/10.1079/9781845934897.0019.
Mukherjee, S. & Litz, R. in The Mango: Botany, Production and Uses (ed R. E. Litz) (CAB international, 2009).
Warschefsky, E. J. & von Wettberg, E. J. B. Population genomic analysis of mango (Mangifera indica) suggests a complex history of domestication. New Phytol. 222, 2023–2037. https://doi.org/10.1111/nph.15731 (2019).
Dillon, N. L. et al. Genetic diversity of the Australian National Mango Genebank. Sci. Hortic. 150, 213–226. https://doi.org/10.1016/j.scienta.2012.11.003 (2013).
Schnell, R. et al. Mango genetic diversity analysis and pedigree inferences for Florida cultivars using microsatellite markers. HortScience 41, 993–993. https://doi.org/10.21273/JASHS.131.2.214 (2006).
Razak, S. A. et al. Assessment of diversity and population structure of mango (Mangifera indica L.) germplasm based on microsatellite (SSR) markers. Aust. J. Crop Sci. 13, 315 (2019).
Lal, S. et al. Association analysis for pomological traits in mango (Mangifera indica L.) by genic-SSR markers. Trees 31, 1391–1409. https://doi.org/10.1007/s00468-017-1554-2 (2017).
Sherman, A. et al. Mango (Mangifera indica L.) germplasm diversity based on single nucleotide polymorphisms derived from the transcriptome. BMC Plant Biol. 15, 277. https://doi.org/10.1186/s12870-015-0663-6 (2015).
Surapaneni, M. et al. Population structure and genetic analysis of different utility types of mango (Mangifera indica L.) germplasm of Andhra Pradesh state of India using microsatellite markers. Plant Systemat. Evolut. 299, 1215–1229. https://doi.org/10.1007/s00606-013-0790-1 (2013).
Hirano, R., Htun Oo, T. & Watanabe, K. N. Myanmar mango landraces reveal genetic uniqueness over common cultivars from Florida, India, and Southeast Asia. Genome 53, 321–330. https://doi.org/10.1139/g10-005 (2010).
Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967. https://doi.org/10.1371/journal.pgen.1002967 (2012).
Evanno, G., Regnaut, S. & Goudet, J. Detecting the number of clusters of individuals using the software structure: a simulation study. Mol. Ecol. 14, 2611–2620. https://doi.org/10.1111/j.1365-294X.2005.02553.x (2005).
Song, S., Dey, D. K. & Holsinger, K. E. Differentiation among populations with migration, mutation, and drift: implications for genetic inference. Evolution 60, 1–12. https://doi.org/10.1554/05-315.1 (2006).
Albrechtsen, A., Nielsen, F. C. & Nielsen, R. Ascertainment biases in SNP chips affect measures of population divergence. Mol. Biol. Evol. 27, 2534–2547. https://doi.org/10.1093/molbev/msq148 (2010).
Nielsen, R. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154, 931–942. https://doi.org/10.1093/genetics/154.2.931 (2000).
Shruti, S. et al. Evaluation of newly developed mango (Mangifera indica) hybrids for their storage behaviour and peel colour. Indian J. Agric. Sci. 81, 252–255 (2011).
Burgess, K. S., Etterson, J. R. & Galloway, L. F. Artificial selection shifts flowering phenology and other correlated traits in an autotetraploid herb. Heredity 99, 641–648. https://doi.org/10.1038/sj.hdy.6801043 (2007).
Roff, D. A. Evolutionary Quantitative Genetics (Springer Science & Business Media, Heidelberg, 2012).
O’Connor, K. et al. Genome-wide association studies for yield component traits in a macadamia breeding population. BMC Genom. 21, 199. https://doi.org/10.1186/s12864-020-6575-3 (2020).
Grotewold, E. The genetics and biochemistry of floral pigments. Annu. Rev. Plant Biol. 57, 761–780. https://doi.org/10.1146/annurev.arplant.57.032905.105248 (2006).
Koes, R., Verweij, W. & Quattrocchio, F. Flavonoids: a colorful model for the regulation and evolution of biochemical pathways. Trends Plant Sci. 10, 236–242. https://doi.org/10.1016/j.tplants.2005.03.002 (2005).
Kayesh, E. et al. Fruit skin color and the role of anthocyanin. Acta Physiol. Plant. 35, 2879–2890. https://doi.org/10.1007/s11738-013-1332-8 (2013).
Smeekens, S., Ma, J., Hanson, J. & Rolland, F. Sugar signals and molecular networks controlling plant growth. Curr. Opin. Plant Biol. 13, 273–278. https://doi.org/10.1016/j.pbi.2009.12.002 (2010).
Teale, W. D., Paponov, I. A. & Palme, K. Auxin in action: signalling, transport and the control of plant growth and development. Nat. Rev. Mol. Cell Biol. 7, 847–859. https://doi.org/10.1038/nrm2020 (2006).
Mishra, B. S., Sharma, M. & Laxmi, A. Role of sugar and auxin crosstalk in plant growth and development. Physiol. Plant. 174, e13546. https://doi.org/10.1111/ppl.13546 (2022).
Brettell, R. I. S., Johnson, P. R., Kulkarni, V. J., Müller, W. & Bally, I. S. E. 645 edn 319–326 (International Society for Horticultural Science (ISHS), Leuven, Belgium).
Aron, Y., Gazit, S., Czosnek, H. & Degani, C. Polyembryony in mango (Mangifera indica L.) is controlled by a single dominant gene. HortScience 33, 1241–1242 (1998).
Hollister, J. D. et al. Recurrent loss of sex is associated with accumulation of deleterious mutations in oenothera. Mol. Biol. Evol. 32, 896–905. https://doi.org/10.1093/molbev/msu345 (2014).
Abirami, K., Singh, S., Singh, R., Mohapatra, T. & Kumar, A. R. Genetic diversity studies on polyembryonic and monoembryonic mango genotypes using molecular markers. Indian J. Horticult. 65, 258–262 (2008).
Shukla, M., Babu, R., Mathur, V. & Srivastava, D. Diverse genetic bases of Indian polyembryonic and monoembryonic mango (Mangifera indica L.) cultivars. Curr. Sci. 85, 870–871 (2004).
Crossa, J. et al. Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity 112, 48–60. https://doi.org/10.1038/hdy.2013.16 (2014).
Isidro, J. et al. Training set optimization under population structure in genomic selection. Theor. Appl. Genet. 128, 145–158. https://doi.org/10.1007/s00122-014-2418-4 (2015).
Norman, A., Taylor, J., Edwards, J. & Kuchel, H. Optimising genomic selection in wheat: effect of marker density, population size and population structure on prediction accuracy. G3 Genes|Genom|Genet 8, 2889. https://doi.org/10.1534/g3.118.200311 (2018).
Lund, M. S., van den Berg, I., Ma, P., Brøndum, R. F. & Su, G. Review: how to improve genomic predictions in small dairy cattle populations. Animal 10, 1042–1049. https://doi.org/10.1017/S1751731115003031 (2016).
Bermingham, M. L. et al. Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci. Rep. 5, 10312. https://doi.org/10.1038/srep10312 (2015).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909. https://doi.org/10.1038/ng1847 (2006).
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190. https://doi.org/10.1371/journal.pgen.0020190 (2006).
Peloso, G. M., Timofeev, N. & Lunetta, K. L. Principal-component-based population structure adjustment in the North American Rheumatoid Arthritis Consortium data: impact of single-nucleotide polymorphism set and analysis method. BMC Proc. 3, S108. https://doi.org/10.1186/1753-6561-3-S7-S108 (2009).
Janss, L., de los Campos, G., Sheehan, N. & Sorensen, D. Inferences from genomic models in stratified populations. Genetics 192, 693–704. https://doi.org/10.1534/genetics.112.141143 (2012).
Kuhn, D. N. et al. Estimation of genetic diversity and relatedness in a mango germplasm collection using SNP markers and a simplified visual analysis method. Sci. Hortic. 252, 156–168. https://doi.org/10.1016/j.scienta.2019.03.037 (2019).
Kuhn, D. N. et al. Genetic map of mango: a tool for mango breeding. Front. Plant Sci. 8, 577. https://doi.org/10.3389/fpls.2017.00577 (2017).
Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528. https://doi.org/10.1093/bioinformatics/bty633 (2019).
Daru, B. H., Elliott, T. L., Park, D. S. & Davies, T. J. Understanding the processes underpinning patterns of phylogenetic regionalization. Trends Ecol. Evol. 32, 845–860. https://doi.org/10.1016/j.tree.2017.08.013 (2017).
Kassambara, A. Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning. Vol. 1 (Sthda, 2017).
Josse, J. & Husson, F. missMDA: a package for handling missing values in multivariate data analysis. J. Stat. Softw. 70, 31. https://doi.org/10.18637/jss.v070.i01 (2016).
R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2019).
Falush, D., Stephens, M. & Pritchard, J. K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587. https://doi.org/10.1093/genetics/164.4.1567 (2003).
Francis, R. M. pophelper: an R package and web app to analyse and visualize population structure. Mol. Ecol. Resour. 17, 27–32. https://doi.org/10.1111/1755-0998.12509 (2017).
Li, Y. L. & Liu, J. X. StructureSelector: a web-based software to select and visualize the optimal number of clusters using multiple methods. Mol. Ecol. Resour. 18, 176–177. https://doi.org/10.1111/1755-0998.12719 (2018).
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945-959. https://doi.org/10.1093/genetics/155.2.945 (2000).
Lawson, D. J., van Dorp, L. & Falush, D. A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots. Nat. Commun. 9, 3258. https://doi.org/10.1038/s41467-018-05257-7 (2018).
Goudet, J. Hierfstat, a package for R to compute and test hierarchical F-statistics. Mol. Ecol. Notes 5, 184–186. https://doi.org/10.1111/j.1471-8286.2004.00828.x (2005).
Jombart, T. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24, 1403–1405. https://doi.org/10.1093/bioinformatics/btn129 (2008).
Jombart, T. & Ahmed, I. adegenet 1.3–1: new tools for the analysis of genome-wide SNP data. Bioinformatics 27, 3070–3071. https://doi.org/10.1093/bioinformatics/btr521 (2011).
Kamvar, Z. N., Tabima, J. F. & Grünwald, N. J. Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ 2, e281. https://doi.org/10.7717/peerj.281 (2014).
Kamvar, Z. N., Brooks, J. C. & Grünwald, N. J. Novel R tools for analysis of genome-wide population genetic data with emphasis on clonality. Front. Genet. https://doi.org/10.3389/fgene.2015.00208 (2015).
Abd El-Wahab, R. H. et al. Anthropogenic effects on population structure of Acacia tortilis subsp. raddiana along a gradient of water availability in South Sinai Egypt. Afr. J. Ecol. 52, 308–317. https://doi.org/10.1111/aje.12121 (2014).
Cheng, F. S. & Roose, M. L. Origin and inheritance of dwarfing by the citrus rootstock Poncirus trifoliataflying dragon. J. Am. Soc. Hortic. Sci. 120, 286–291 (1995).
Sax, K. & Gowen, J. W. The place of stocks in the propagation of clonal varieties of apples. Genetics 8, 458 (1923).
Acknowledgements
We thank David Kuhn formerly of Agriculture Research Services, United States Department of Agriculture (ARS-USDA) and Barbara Freeman of ARS-USDA for providing the genotypes. This research was undertaken as part of the National Tree Genomics Program – Phenotype Prediction project (AS17000) which is funded by the Hort Frontiers Advanced Production Systems fund, part of the Hort Frontiers strategic partnership initiative developed by Hort Innovation, with co-investment from The University of Queensland and Queensland Government, and contributions from the Australian Government. The data and its use is by courtesy of the State of Queensland, Australia, through the Department of Agriculture and Fisheries.
Author information
Authors and Affiliations
Contributions
M.W., C.H. and D.O. designed the study. I.B., N.D. and A.A. collected and curated the data. M.W., R.Y. and M.J. performed data analysis. M.W., M.J., R.Y. and D.O. wrote the manuscript. D.O. and C.H. secured funding and were mentors. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wilkinson, M.J., Yamashita, R., James, M.E. et al. The influence of genetic structure on phenotypic diversity in the Australian mango (Mangifera indica) gene pool. Sci Rep 12, 20614 (2022). https://doi.org/10.1038/s41598-022-24800-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-022-24800-7
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.