Introduction

Linkage disequilibrium (LD) refers to association of particular allelic configurations at distinct loci in the genome of a sampled population (Weir, 1979). LD mapping or association mapping can be applied to natural populations, sets of germplasm accessions or cultivars developed recently (Rostoks et al., 2006; Robbins et al., 2011) and assumes that only markers in strong LD with a functionally important gene will be significantly associated to the variation in quantitative traits (Ardlie et al., 2002; Garris et al., 2003). LD mapping thus takes advantage of the much larger number of historical recombination events that have occurred over time compared with the relatively restricted levels of recombination inherent in biparental mapping populations where quantitative trait loci (QTL) mapping studies have frequently been used to unravel the genetic basis of traits (Cardon and Bell, 2001). In order to appropriately apply LD mapping in crop plants, it is a prerequisite to understand the extent and patterns of LD in the particular populations being investigated (Lander and Schork, 1994). It is also crucial to be able to distinguish between physical LD and the other different forces that can create LD in natural populations, to avoid the detection of spurious associations (Flint-Garcia et al., 2003).

LD can be caused by unknown population structure and several forces, including mutation, drift, genetic bottlenecks, founder effects, selection, and specifically for plants, level of inbreeding caused by their mating systems (Hartl and Clark, 1997). In contrast, physical LD tends to be continuously decreasing because of accumulated recombinations, which means that loci located far apart along chromosomes will generally remain in LD for shorter periods (Hartl and Clark, 1997; Mather et al., 2007). Outcrossing species generally exhibit low levels of LD because of many opportunities for effective recombination events in large highly heterozygous populations (Yan et al., 2009). In contrast, many crop species including asparagus bean, are inbreeders and would thus be expected to exhibit relatively higher levels of LD (Flint-Garcia et al., 2003; Morrell et al., 2005).

The extent and patterns of LD can be investigated on a haplotype, chromosome or whole genome level. Tenaillon et al. (2001) sequenced 21 loci located on the maize chromosome 1 and estimated that in landraces the LD decay distance was <1000 bp. In contrast, a study of 18 maize genes in 36 commercial inbred lines revealed extensive LD blocks as long as 100 Kb (Ching et al., 2002). Thus, as pointed out by Yan et al. (2009), LD decay estimation based on a single chromosome or a limited number of loci may be biased. Uncovering LD patterns of plant species of interest at a genome level is therefore essential for the optimal design of genotyping efforts before conducting LD mapping and for assessing the statistical power and resolution in whole-genome association studies (McNally et al., 2009).

Cowpea (Vigna unguiculata L. Walp., 2n=2x=22) is an important self-pollinating grain legume, fodder and vegetable crop in many tropical/subtropical regions of the world. Two main divisions of cultivated cowpea are the dominant subspecies unguiculata used primarily for dry grain and fodder and sesquipedialis, which is harvested at the immature green pod stage and used as a vegetable (Timko et al., 2007). The latter is also known as asparagus bean or ‘yard-long’ bean, for it is characterized by its very long (0.5–1 m) pods with less fiber, narrow kidney-shaped seeds and climbing growth habit. Genetic similarity between the two subspecies is high as evidenced by fully fertile hybrids and synteny for most of the shared molecular markers between genetic maps (Muchero et al., 2009; Xu et al., 2011). Cowpea is thought to have undergone a severe bottleneck during domestication (Panella and Gepts, 1992). Although genetic diversity has not been comprehensively assessed in the asparagus bean germplasm, it is likely low, keeping the assumption that the crop may have derived from a limited sample of the wider cowpea gene pool that moved from Africa to east Asia, followed by a further genetic narrowing as strong selection for long pods gave rise to the present day ssp. sesquipedialis (Fang et al., 2007; Xu et al., 2010). Assessing both genetic variation levels and genome-wide LD patterns in asparagus bean thus would not only be important for LD mapping but also shed some light on the past effects of intensive human selection for vegetable use, compared with materials that would be closer to the ancestral species.

Recently, a high-throughput Illumina GoldenGate assay platform with 1536 expressed sequence tag-derived single nucleotide polymorphism markers (SNPs) became available for cowpea. Using this system, high-density consensus genetic linkage maps for the ssp. unguiculata were constructed (Muchero et al., 2009; Lucas et al., 2011). A comparable genetic linkage map of asparagus bean also was constructed in the authors’ lab using the same system and additional SSR markers (Xu et al., 2011). The main objective of the present study was to analyze the genetic variation and LD patterns in asparagus bean based on a panel of 95 Chinese accessions genotyped using the above SNP markers.

Materials and methods

Plant materials

A sample set consisting of 95 asparagus bean accessions from a wide geographic origin across China and four ssp. unguiculata accessions (as a control) were used in the current study. Details of the plant materials including names, origins, subspecies assignment and morphological characteristics can be found in Supplementary Table 1.

DNA extraction

Genomic DNA was extracted from leaves of 2-week-old plants using a DNeasy Plant DNA miniprep kit (Qiagen, Hilden, Germany) according to the procedures described by the manufacturer.

SNP genotyping

Each of the 99 accessions was genotyped for SNPs using the KASPar (KBiosciences, Hoddeston, UK) 1127-SNP genotyping platform converted from the Illumina cowpea GoldenGate 1536-SNP assay system (Muchero et al., 2009). Names of the 1127 SNPs are listed in Supplementary Table 2 and their details including map positions information can be accessed at HarvEST (http://www.harvest-web.org/hweb/bin/wc.dll?hwebProcess~hmain~&versid=68). After fluorescence scanning of the reactions, the results were interpreted by the software KlusterCaller 1.1 (KBiosciences), which can be accessed at the Dryad repository (doi:10.5061/dryad.6tv35cc2).

Inference of structure and kinship

Population structure was calculated using a Bayesian model-based clustering method implemented in the software STUCTURE 2.3.3 (Pritchard et al., 2000) using data from 422 informative SNPs (see results below). We ran STRUCTURE under the ‘admixture model’ with a burn-in period of 100 000 followed by 100 000 replications of Markov Chain Monte Carlo. Three independent runs each were performed with the number of clusters (K) varying from 1 to 10. A statistic ΔK based on the relative rate of change in the likelihood of the data between successive K values were used to determine the optimal number of clusters (Evanno et al., 2005). Lines with probability of membership >70% were assigned to a subgroup. No a priori population information was used. Pairwise genetic distances were calculated using the software Powermarker 3.25 under the Nei 1983 model (Liu and Muse, 2005). Relative kinship matrix was constructed using the software SPAGeDi and negative values between two individuals were changed to 0 (Hardy and Vekemans, 2002).

Analysis of LD

LD was measured by calculating the square value of correlation coefficient (r2) between each SNP pair with the software package TASSEL 2.1 (Bradbury et al., 2007). Only SNP loci with minor allele frequency values above 0.1 and having at least 80% successful calls among the sample set were included further for LD analyses. P-values for each r2 estimate were obtained with a two-sided Fisher's exact test as implemented in TASSEL. The critical r2 for LD decay was determined by taking the parametric 95th percentile of the distribution of r2 square root for all unlinked loci (Breseghello and Sorrells, 2006). LD plots against map distance for each linkage group (LG) were generated in Microsoft Excel (Redmond, WA, USA), where only r2 values with P<0.01 were included. Map position of each marker was obtained from the improved cowpea consensus map, allowing including the maximum number of markers (Lucas et al., 2011). The LD decay trend lines were generated using a window size based method (Yan et al., 2009).

Analysis of LD variance components

The computer program LinkDos (Garnier-gere and Dillmann, 1992) was employed to partition the variance of LD into within- (DIS2 and DST2) and between- (DST2 and DIS2) subpopulation components, following Ohta's D-statistics estimation method (1982). The greater between-subpopulation than within-populations variance components would suggest that genetic drift has an important role in shaping observed patterns of LD. For each chromosome, all the marker pairs with an R20.2 and P<0.01 were included in Ohta's analysis and the results were averaged to give a comprehensive insight. Significance of the differences between within- and between-subpopulation variance components was tested by ANOVA.

Results

Data quality and genome wide SNP diversity

The KASPar SNP assay harboring 1127 SNPs gave a technical success rate of 97.7%, which means that 1102 SNPs were successfully called in the sample set. Of these, eight SNPs showed >20% missing data, 247 SNP loci were found to be monomorphic in all 99 lines and 420 SNPs had a minor allele frequency value below 0.1, which were removed from further analyses. Only five SNP loci were heterozygous in >10% of the accessions, consistent with the self-pollinating nature of asparagus bean and confirming the high level of homozygosity in the sample set. A final total of 422 SNPs were used for further data analysis. Of these, 415 SNP markers having known map positions and representing 370 unique loci were included for creating the LD plots (Supplementary Table 3). All the 11 LGs were covered by these SNPs, with the highest (76 SNPs) and lowest (23 SNPs) numbers observed in LG 3 and LG 10, respectively. The average marker distance was 1.58 cM, ranging from 1.01 to 2.69 cM across the 11 LGs. The allele frequency distribution of the SNPs, which are bi-allelic, showed a continuous pattern, with a peak position falling into the region of 0.1–0.18 (Figure 1).

Figure 1
figure 1

Allele frequency of the SNPs. Only SNPs with a minor allele frequency (MAF) >0.1 are shown. SNP frequencies >0.5 were shown as 0.5.

Identification of clusters and relative kinship in the sample set

Clustering inference performed with K from 1 to 10 showed that the most significant change of likelihood occurred when K increased from 1 to 2, and the highest ΔK value was observed at K=2 followed by a drastic decline of ΔK from K=3 (Table 1, Supplementary Figure 1). This suggested that the 99 genotypes could be assigned into two subgroups. Using a probability of membership threshold of 70%, 43 and 33 lines were assigned into the two subgroups, respectively. The remaining 23 lines were considered as intermediates (Supplementary Table 1, Supplementary Figure 2).

Table 1 Average Ln P(D), s.d. and ΔK values at K from 1 to 10 showing the population structure of the sample set

In general there was no association between subgroups inferred from structure and the geographic origin of the materials, reflecting the probable extensive exchange of parental lines by breeders nation-wide; however, we found strong associations of subgrouping with plant morphology and usage. For example, subgroup I was consisted entirely of those ‘standard vegetable’ type lines, which means they all have very long tender pods (mean 50.5 cm, median 50.2 cm) typical of commercial varieties for vegetable use and strong climbing growth habit. In contrast, subgroup II (n=29 excluding the four ssp. unguiculata lines) mainly comprised the ‘non-standard vegetable’ type lines, which in general have shorter pods (mean 30.7 cm, median 30.6 cm) with higher fiber content, oval-shaped seeds suitable for grain use and dwarf or bush-type plant architecture (Supplementary Table 1 and 4). The two subgroups were hereafter renamed as subgroup SV (standard vegetable) and NSV (non-standard vegetable), respectively. All the four ssp. unguiculata lines were grouped into the NSV subgroup, suggesting a closer genetic relationship between this subgroup and the subspecies unguiculata.

The pairwise genetic distance among the 99 genotypes ranged from 0.01 to 0.58, with an average of 0.32. The greatest genetic distance was observed between ‘Charleston Greenpack’, an improved cowpea (ssp. unguiculata) cultivar from the southern United States (Fery, 1998) and ‘Prince Charming’, an improved Chinese asparagus bean (ssp. sesquipedialis) cultivar. The average genetic distance within subgroup SV (0.22) was lower than that in subgroup NSV (0.37). Analysis of relative kinship showed that 3124 (69%) of all the pairwise kinship values were between 0 and 0.05, whereas 18.9 % of the values were above 0.25, indicating that most individual pairs are not or only weakly related whereas members of subsets have a considerable level of relatedness.

LD in the asparagus bean genome

As the 95 asparagus bean accessions could be divided into two distinct subgroups, pairwise LD estimates were performed within each subgene pool. Globally, 3937 (4.6%) and 2909 (3.4%) of the total possible SNP locus pairs were in significant LD (P<0.01) in subgroup SV and NSV, respectively (Table 2). Of these, 756 and 628 of the significant associations were intrachromosomal, accounting for 8.6% and 7.1 % of the total possible intrachromosomal correlations. The proportion of significant LD among unlinked loci to the total possible interchromosomal correlations was lower, being 4.3% and 3% in the two subgroups, respectively. In both subgroups, an uneven distribution of LD among the 11 chromosomes was observed.

Table 2 Locus pairs in significant (P<0.01) LD, r2 values and extent of LD in the two subgene pools

Intrachromosomal LD extended to a relatively long distance in asparagus bean, with the means of 15 (median 9.5) and 11.2 (median 3.5) cM in the two subgene pools, respectively. Significant LD was even observed at a distance over 60 cM, despite the very small proportions (2.4% and 1.7% of all the significant intrachromosoaml associations in subgroup SV and NSV, respectively). The strength of LD was high in asparagus bean, as reflected by the mean r2 of 0.39 and 0.34 in subgroup SV and NSV, respectively. Albeit variable among LGs, the level (r2) and extent of LD were generally higher in subgroup SV than in subgroup NSV (Table 2, Figure 2).

Figure 2
figure 2

Decay of LD (r2) as a function of genetic distance (cM) between pairs of loci on individual LGs and the whole genome in asparagus bean. Only r2 values with P<0.01 are shown.

Based on the parametric 95th percentile of the distribution of square-rooted r2 values for unlinked markers, r2 thresholds of 0.5 and 0.4 were adopted for estimating LD decay in subgroup SV and NSV, respectively. It was showed that, though high in r2, LD decayed rapidly with increasing genetic distance in both subgene pools. On the whole genome scale, LD decayed within 2 cM (Figure 2). At the chromosome level, the LD for LG 3, 4 and 6 all decayed at 0–2 cM in subgroup SV, and for LG 1, 2, 7, 9, 10 and 11 within 2.5–5 cM. LD appeared more complicated for LG 5 and LG 8, as indicated by the uneven decay along the chromosomes. In subgroup NSV, LD decayed more rapidly. Six of the eleven chromosomes (LG 1, 2, 4, 6, 7 and 8) had their LD decay distances near or shorter than 1 cM, whereas LD for the other five chromosomes all decayed within 2–3 cM.

LD partitioning

To allow for discrimination between genetic drift and epistatic selection, the different possible causes that create and maintain LD in natural populations, Ohta's (1982) analysis was performed by comparing the variance components within- (DIS2 and DST2) and between- (DST2 and DIS2) subpopulations. As shown in Table 3, DST2 was significantly greater than DIS2 on the whole genome scale, whereas DIS2 was significantly smaller than DST2, suggesting that globally the effect of epistatic selection and genetic drift cannot be simply compared. Similar results also were observed at the chromosome level; however, LD in at least two of the 11 LGs, that is, LG 5 and 7 may primarily be because of epistatic selection, because the within-subpopulations variances components were consistently greater than those between subpopulations (see also discussion below).

Table 3 Overall and linkage group (LG)-based Ohta's variance components of LD

Discussion

Genetic diversity and domestication history of Chinese asparagus bean germplasm

Unlike staple food crops such as rice and maize where abundant germplasm resources are available for LD analysis, the genetic resources of asparagus bean is limited. In the current study, 99 genotypes were selected, based on morphology, geographic origin and pedigree (if known), from the available Chinese asparagus bean germplasm to represent a putative core collection. Despite their diverse morphological characteristics and origins, the genetic diversity within the sample set is low, as reflected by the low level of SNP polymorphism and genetic distance. This result is consistent with previous phylogenetic studies based on AFLP or SSR markers (Fang et al., 2007; Xu et al., 2010). However, these results should be viewed with some caution because of potential ascertainment bias in the original selection of SNPs put on the platform used in the assays (Muchero et al., 2009). Although highly variable in morphology, several authors provide evidence that cowpea went through a severe genetic bottleneck during domestication and therefore has lower inherent genetic diversity (Panella and Gepts 1992; Pasquet 1999). Asparagus bean (ssp. unguiculata), as a subspecies of cowpea originating from only a small portion of the genetic variation of domesticated ssp. unguiculata would have gone through an additional bottleneck due to founder effects and intense selection for pod characteristics favorable for vegetable use (Fang et al. 2007). Therefore, the low level of genetic diversity observed in this study across the Chinese asparagus bean germplasm fits well with its likely domestication history.

The classification of the 99 accessions into two subgroups that differ mainly in pod length and growth habit indicates the impact of long-term human selection toward traits for vegetable use on the differentiation of ssp. unguiculata/asparagus bean, as lines with pods shorter than 36 cm or dwarf/semi-dwarf architecture were grouped together with the four ssp. unguiculata accessions. The analysis of LD variance components also suggested that epistatic selection might be the main force shaping the current pattern of LD in at least part of the genome. Our results therefore provide solid evidence proving the long-existing hypothesis of the domestication history of cultivated asparagus bean as mentioned above from a population genetic perspective. We also note that three landrace accessions (No.7, 18 and 80) morphologically more fitting the ‘standard vegetable’ type were classified into the NSV subgroup. This observation, though common, indicates that genetic factors controlling morphological variations of pod length and growth habit are not the only forces dominating the stratification of Chinese asparagus bean germplasm. An interesting task in the near future is to more thoroughly address the mechanisms behind the domestication of cultivated asparagus bean.

Comparison of LD patterns between asparagus bean and other plant species

The LD level appears high in asparagus bean; however, decay of LD is still rapid (0–2 cM genome wide). Given the genome size of 630 Mb and the genetic linkage map of 680 cM (Lucas et al., 2011; Xu et al., 2011), this approximately equals 1.84 Mb of physical distance. Such level and extent of LD are comparable to those observed in the European barley collection (Rostoks et al., 2006), and the LD decay distance is even shorter than that in a small population of common bean (Phaseolus vulgaris L.) from America, a close relative of asparagus bean, where LD decays to 0.1–0.2 within 6–12 cM (McConnell et al., 2010). Collectively, these observations indicate that in inbreeding species including asparagus bean, the LD decay distances are fairly long. In contrast, LD declines very rapidly in outcrossing species where physical recombination is more effective. For instance, LD decays within only a few kilobases in maize (Remington et al., 2001; Yan et al., 2009) and only 200 bp in a wild sunflower population (Liu and Burke, 2006).

Variation in LD magnitudes across different germplasm and chromosomes

The extent of LD is usually affected by population structure. For example, Morrell et al (2005) reported a rapid decay of LD in wild barley compared with domesticated barley, even though both are highly self-pollinated. Similar results were reported between the indica and japonica groups of rice (Mather et al., 2007), the wild and domesticated subspecies of grapevine (Barnaud et al., 2010) and the landrace and modern varieties groups of wheat (Hao et al., 2011). Our results here showed that LD patterns in the two asparagus bean subdivisions are also different and that the rate of interchromosomal LD was reduced to, on average, 3.6% compared with 8% when using the combined gene pool (data not shown). The subgroup SV globally had a higher level and longer decay distance of LD, suggesting relatively limited historical recombination events and less genetic diversity in this more typical vegetable-type subdivision perhaps because of a more severe historical population bottleneck. The distances and patterns of LD decay are most different on LG 5, 7 and 11 between the two subgroups, suggesting that these chromosomes may carry more genes/QTLs related to agronomic traits favorable for vegetable use that have been strongly selected by cultivators and breeders of this crop. This is in good agreement with the Ohat's LD variance component analysis. An important future task is to fine-dissect the LD patterns between the two subgroups of asparagus bean germplasm using broader populations that may also include more ssp. unguiculata accessions to get better understanding of the genetic architecture of the cowpea natural resources globally.

Implications on genome wide association study in asparagus bean

For genome wide association mapping, long-range LD will reduce the number of markers needed to cover the genome, although the resolution for mapping is expected to be low (Yu and Buckler, 2006). The level of LD decay at a −cM level in the Chinese asparagus bean germplasm we observed suggests that a genome wide detection of QTLs is feasible using the currently available SNP maps and marker system. Assuming that one marker is needed per 1.5 cM along the 680 cM asparagus bean genome, 450 informative markers evenly distributed across the genome would be sufficient to uncover any significant associations using genome wide association studies. Given the expressed sequence tag-derived nature of the SNP markers employed here, the power for association mapping is expected to be higher than using other marker systems such as genomic SSRs, of which the majority are located in the non-coding regions of the genome (Yan et al., 2009). In addition, the presence of population stratification and kinship among a subset of the plant materials, as well as the 3.6% interchromosomal associations, all should be taken into account in association analyses (Yan et al., 2009; Hao et al., 2011).

To further complement the possible low mapping resolution caused by high-level LD in asparagus bean, a two-step procedure as proposed by Barnaud et al. (2010) would be considered for future work. Based on the discovery here that landrace asparagus bean lines (and especially the ssp. unguiculata), are genetically more diverse and thus might should exhibit shorter LD decay distance, QTL locations determined from ‘standard’ asparagus bean populations could be refined by studying populations of more ‘rustic’ asparagus bean or ssp. unguiculata accessions with much more limited LD. In addition, bi-parental QTL mapping using large RIL populations or NIL sets would also provide higher resolution for QTLs of interest. In our lab, a project that aims to uncover the genetic factors controlling the pod length and climbing ability of asparagus bean using the combined two-step association mapping and bi-parental QTL mapping approach is now underway.

Data archiving

Data have been deposited at Dryad: doi:10.5061/dryad.6tv35cc2.