Introduction

Pineapple, Ananas comosus (L.) Merr., is a perennial herbaceous fruit crop belonging to the family Bromeliaceae. The crop is cultivated in all tropical and subtropical regions and ranks third in production among noncitrus tropical fruits, following banana (including plantain) and mango. The annual worldwide production reached 21.9 million metric tons in 2012 and the top seven producers (Brazil, Philippines, Thailand, Costa Rica, Indonesia, India, and China) jointly accounted for 90% of the global production (FAO, 2014).1 The pineapple plant is indigenous to South America.2,3 The putative center of origin is located in the Paraná–Paraguay River drainages between southern Brazil and Paraguay, based on the diversity distribution of related species and botanical varieties of pineapple in this region.46 However, the eastern part of the Guiana shield has also been hypothesized as the center of domestication for pineapple, based on the variation of chloroplast and nuclear DNA markers, the high level of phenotypic diversity, and the large number of primitive cultigens in this area.7,8 Pineapple was widely cultivated in tropical Americas before the arrival of Christopher Columbus, the first European to see this fruit, in 1493.9

The introduction of pineapple into Asia and the Pacific began with the Spaniards in the early sixteenth century and pineapple reached Africa in mid-sixteenth century.4 Since then, there have been multiple introductions and exchanges of germplasm among the pineapple producing countries. 4 Although many landraces and traditional cultivars exist in the Americas, only a few cultivars have been dispersed to Asia and Africa for use in commercial production.4,10 About 70% of the world’s production comes from a single cultivar, Smooth Cayenne,10 which is a highly productive pineapple excellent for canning.11 The current fresh fruit pineapple market is largely comprised of two cultivars bred by the Pineapple Research Institute, CO-2 and MD-2.11 Developing new cultivars with desirable resistance and postharvest traits will depend on the available germplasm of this species. The United States Department of Agriculture (USDA) - Agricultural Research Service pineapple germplasm collection in Hilo, Hawaii, is one of the major collections in the world, along with the collections maintained by EMBRAPA/CNPMF in Cruz das Almas, Brazil, and by CIRAD-FLHOR in Martinique. As part of the ARS National Clonal Germplasm Repository for Tropical and Subtropical Fruit and Nut Crops, the collection at Hilo currently maintains over 180 accessions of pineapple cultivars and their wild relatives.

As with many other tropical perennial crops, pineapple germplasm is almost exclusively maintained by vegetative propagation, by crowns, slips, suckers, or in vitro culture. Vegetative propagation has allowed the exchange of germplasm as clones among regions, countries, and continents. However, the exchange of vegetative planting materials has also resulted in problems for conservators of pineapple germplasm because records and labels of the cultivars have not always followed the same naming conventions, and accessions have limited information about their correct identity. Therefore, homonyms and synonyms are common among the names of pineapple cultivars and that restricts the sharing of information and materials among pineapple researchers and hampers the use of pineapple germplasm in breeding.1214 Another major challenge for pineapple cultivar identification is that the protracted vegetative propagation has led to the accumulation of somatic mutations. Some mutations caused noticeable phenotypic effects and created intra-cultivar variation, which became the target of clonal selection.10 While these selected mutants are important in horticultural production, it is necessary to identify them so that breeders and genebank curators can efficiently conserve and use these genetic materials.

The utilization of biochemical and DNA molecular markers for pineapple germplasm management has been recently reviewed.14 Using isozyme markers, Aradhya et al. studied pineapple germplasm and found considerable variation within and between species of Ananas.15 In the Hawaii pineapple collection, they identified 66 distinct zymotypes that were able to differentiate all species and botanical varieties. Their results also suggested that, rather than genetic divergence due to reproductive isolating barriers, differentiation among the species of Ananas may be due to ecological isolation, and therefore may represent a species complex.

Both dominant DNA markers (amplified fragment length polymorphism, AFLP) and co-dominant markers (restriction fragment length Polymorphism simple sequence repeat, SSR) have been used to assist pineapple cultivar identification and germplasm management.1623 In spite of the significant progress in marker-assisted germplasm management over the last 20 years, cultivar identification in pineapple remains a challenging task. Using AFLP markers, Kato et al. characterized 148 A. comosus accessions maintained in the USDA pineapple collection in Hilo, Hawaii.20 They showed that a unique profile for major groups that had been classified by morphological traits, such as ‘Cayenne’, ‘Spanish’, and ‘Queen’, could not be established using AFLP-based DNA fingerprints. SSR markers likewise lacked congruence between phenotype and molecular marker-based classification in pineapple.22,23 Moreover, neither AFLP nor SSR are the most suitable marker tool for detection of duplicates in the pineapple germplasm.

Single nucleotide polymorphisms (SNPs) are the most abundant class of polymorphisms in plant genomes. Compared to SSR markers, SNP analysis can be done without requiring DNA separation by size and can, therefore, be automated in high-throughput assay formats. The diallelic nature of SNPs facilitates a much lower error rate in allele calling and promotes compatibility between laboratories. These advantages have resulted in the increasing use of SNPs as the markers of choice for accurate genotype identification and diversity analysis in perennial crops, as recently demonstrated in cacao (Theobroma cacao, 2013),24 grapevine (Vitis vinifera, 2011),25 pummelo (Citrus maxima, 2014),26 strawberry (Fragaria spp, 2013),27 tea (Camellia sinensis, 2014), and longan (Dimocarpus longan, 2015). Like other perennial horticulture crops, DNA fingerprinting that uses a small set of SNP markers is in great demand by the pineapple community for a broad range of research and field applications. These applications include, but are not limited to, identification of mislabeled accessions, parentage, and sibship analysis for quality control in breeding and seeds programs, and authentication and traceability to support the production of high-value clones for premium markets. Nonetheless, this most powerful tool for germplasm management has not been applied to pineapple germplasm management.

Ample genomic resources have been developed for pineapple.14,2831 The premier online database, ‘PineappleDB’ (http://genet.imb.uq.edu.au/Pineapple/index.html), includes a more than 5,600 expressed sequence tags (ESTs) with 3,383 consensus sequences. The comprehensive sequence, bioinformatics, and functional classification of EST resources are available for text or sequence-based searches. A draft genome of A. comosus has been developed, which covers about 375 Mb (62%) of the estimated 526 Mb genome of this species.14 These readily available genomic resources provide opportunities for mining new markers to use for pineapple germplasm management and breeding. The objectives of the present study were to develop SNP markers through the data mining of ESTs and transcriptome data and to assess their potential application for pineapple cultivar identification. The results reported herein represent the first validation study of SNPs in pineapple, demonstrating the utility of a transcriptome as an approach for rapid development of a high-quality genotyping tool. These SNP markers, as well as the genotyping method, will be particularly useful for intellectual property rights in varietal protection, germplasm management, and pineapple breeding programs.

Materials and methods

Mining of putative SNPs from EST and nucleotide sequences

All available nucleotide sequences of Ananas spp. were downloaded from NCBI GenBank (http://www.ncbi.nlm.nih.gov, 4 October 2014). Redundant entries were examined and excluded using the CD-HIT program with a 95% sequence similarity threshold. The FASTA-formatted files of pineapple were merged into a single data set for further data mining. Putative EST-SNPs were detected using the QualitySNP program.32 All of these selected clusters included a minimum of six EST sequences, whereas both the minimum redundancy threshold and minimal confidence score required by QualitySNP was set at three. In order to meet the requirements and constraints for primer design, all candidates for SNP markers with less than 50 nucleotides between two neighboring SNPs were removed. A subset of 96 identified SNP sequences was then chosen for design and manufacture of SNP assay.

Validation of putative SNPs

To evaluate the putative SNP markers for suitability of cultivar identification, we used a nanofluidic genotyping system and validated the SNPs for 170 pineapple accessions (Table 1; Supplementary Table S1). The pineapple germplasm samples were from the pineapple collection maintained by the USDA-ARS Tropical Plant Genetic Resources and Disease Unit, at the National Plant Germplasm Repository in Hilo, Hawaii (http://www.ars.usda.gov/main/site_main.htm?modecode=20-40-05-10) were harvested and dried in silica gel. DNA was extracted from dried pineapple leaves with the DNeasy Plant Mini kit (Qiagen Inc., Valencia, CA, USA), which is based on the use of silica as an affinity matrix. The dry leaf tissue was placed in a 2-mL microcentrifuge tube with one quarter-inch ceramic sphere and 0.15 g garnet matrix (Lysing Matrix A; MP Biomedicals. Solon, OH, USA). The leaf samples were disrupted by high-speed shaking in a TissueLyser II (Qiagen Inc.) at 30 Hz for 1 min. Lysis solution (DNeasy kit buffer AP1 containing 25 mg/mL polyvinylpolypyrrolidone), along with RNase A, was added to the powdered leaf samples and the mixture was incubated at 65 °C, as specified in the kit instructions. The remainder of the extraction method followed the manufacturer’s suggestions. DNA was eluted from the silica column with two washes of 50 µL buffer AE, which were pooled, resulting in 100 µL DNA solution. Using a NanoDrop spectrophotometer (Thermo Scientific, Wilmington, DE, USA), DNA concentration was determined by absorbance at 260 nm. DNA purity was estimated by the 260:280 ratio and the 260:230 ratio.

Table 1 Species categories of 170 A. comosus accessions in USDA-ARS pineapple collection at Hilo, Hawaii.

Ninety-six putative SNP sequences were submitted to the Assay Design Group at Fluidigm Corp. (South San Francisco, CA, USA) for design and manufacture of primers for a SNPtype genotyping panel. The assays were based on competitive allele-specific PCR, and they enable bi-allelic scoring of SNPs at specific loci (KBioscience Ltd, Hoddesdon, UK). The Fluidigm SNPtype Genotyping Reagent Kit was used according to the manufacturer’s instructions.34,35 Using these primers, the isolated DNAs were subjected to Specific Target Amplification in order to enrich the SNP sequences of interest.34 Genotyping was performed on a nanofluidic 96.96 Dynamic Array IFC (Integrated Fluidic Circuit; Fluidigm Corp.). This chip automatically assembles PCRs, enabling simultaneous testing of up to 96 samples with 96 SNP markers. The use of a 96.96 Dynamic Array IFC for SNP genotyping of human samples has been described by Wang et al.33 End-point fluorescent images of the 96.96 IFC were acquired on an EP1 imager (Fluidigm Corp.). The data were analyzed with Fluidigm Genotyping Analysis Software (Fluidigm Corp.).36

Data analysis

Duplicate accessions were identified using pairwise multilocus matching among all individual samples. DNA samples that were fully matched at the genotyped SNP loci were declared the same cultivar or clones. The program GenAlEx 6.5 (2006, 2012) was used for computation.37,38

After duplicate identification, the redundant samples were removed and descriptive statistics for measuring the informativeness of the SNP markers were calculated based on the remaining distinctive cultivars. The key descriptive statistics included minor allele frequency, observed heterozygosity, expected heterozygosity, Shannon’s information index, and inbreeding coefficient. Computations were carried out using the same program.

A cluster analysis using the neighbor-joining (NJ) method was used to further examine the genetic relationship among accessions. Kinship coefficient was chosen as genetic distance measurement of shared ancestry among the individual accessions. The computation was executed using MICROSATELLITE ANALYZER (MSA, 2003).39 A dendrogram was generated from the resulting distance matrix using the NJ algorithm available in PHYLIP.40,41 The unrooted tree was visualized using the web-based tool Interactive Tree of Life v2 (http://itol.embl.de/).42

A model-based clustering algorithm implemented in the STRUCTURE software program was applied to the SNP data.43 This algorithm attempted to identify genetically distinct subpopulations based on allele frequencies. The admixture model was applied and the number of clusters (K-value), indicating the number of subpopulations the program attempted to find, was set from 1 to 10. The analyses were carried out without assuming any prior information about the genetic group or geographic origin of the samples. Ten independent runs were assessed for each fixed number of clusters (K), each consisting of 1 × 106 iterations after a burn-in of 2 × 106 iterations. The ΔK value was used to detect the most probable number of clusters and the computation was performed using the online program STRUCTURE HARVESTER.44 Of the 10 independent runs, the one with the highest Ln Pr (X|K) value (log probability or log likelihood) was chosen and represented as bar plots.

To test the hypothesis that vars. ananassoides, bracteatus, and erectifolius are the putative progenitors for cultivated pineapples, we applied parentage analysis to verify the origin of the 53 accessions in A. comosus var. comosus (as labeled in Table 1). These cultivars or breeding lines were considered as ‘offspring’ for which parentage analyses were carried out. A. comosus vars. ananassoides, bracteatus, and erectifolius were used as candidate parents. A likelihood-based method implemented in the program CERVUS 3.0 was used for computation.45,46 For each parent–offspring pair, the natural logarithm of the likelihood ratio (LOD score) was calculated. Critical LOD scores were determined for the assignment of parentage to a group of individuals without knowing the maternity or paternity. Simulations were run for 10 000 cycles, assuming that 10% of candidate parents were sampled, a total of 90% of loci was typed with a 1% typing error rate. The most probable single mother (or father) for each offspring was identified on the basis of the critical difference in LOD scores (D) between the most likely and the next most likely candidate parent at greater than 95 or 80% confidence.45,46

Results

SNP discovery

A total of 13 203 mRNA nucleotide and 5941 EST sequences from pineapple were gathered using methods previously described. After adapter removal, trimming, and quality control, 18 241 higher quality sequences were selected. The program CAP3,47 using default parameters, was used to assemble sequences into 1793 contigs and 11 809 singlets with an average size of 3.59 sequences per contig, among which putative SNPs were detected in only 48 contigs using the QualitySNP program. Each of these selected clusters included a minimum of six EST sequences. In total, we obtained 213 putative SNPs, including 75C/T, 59A/G, 10A/T, 12A/C, 4T/G, 11C/G, 41 indel, and 1 high tri-allelic polymorphism. To select high-quality SNPs for validation, candidate SNP sites with at least 50 bp before and after the site were filtered. We calculated the number of all sequences in a cluster and the number containing the SNP type in this cluster. We then selected 96 SNPs for validation by genotyping the 170 pineapple accessions in the USDA-ARS pineapple collection.

Screening for polymorphic SNP markers

Out of the chosen 96 SNP markers, 80 were successful for genotyping. The failure of the remaining 16 SNPs was likely due to the sequence complexity or the presence of polymorphisms within the flanking sequences. However, among the 80 successful SNPs, 23 were monomorphic across the 170 pineapple samples (i.e. only one SNP variant was identified in all individuals). These monomorphic markers may have resulted from errors in transcriptome sequencing, which then led to the incorrect identification of SNP. It is also possible that some of these SNPs may correspond to rare alleles that were not present in the set of pineapple accessions we analyzed. A total of 57 polymorphic SNPs were retained for further analysis of this sample set. These 57 SNPs were reliably scored across the validation panel and thus were considered true SNPs. The flanking sequences of these 57 SNPs are listed in Table 2.

Table 2 Flanking sequences and SNPs of the 57 polymorphic markers.

Cultivar identification

SNP profiles of the multiple accessions from the same pineapple cultivar showed that genotyping results were highly consistent (Table 3). Multilocus matching of SNP fingerprints revealed a high rate of duplicates in this pineapple collection. A total of 130 accessions could be classified into 24 synonymous groups (Table 4). The largest synonymous group, which includes 36 accessions, was found in cultivar Cayenne. It is also noticeable that some accessions within the same synonymous group have apparent morphological differences, despite matching SNP profiles, indicting somaclonal mutation within the synonymous group. For example, Cayenne 7898 QC has atypical yellow flesh color, whereas Cayenne 7898 4N has a white color, but their SNP profiles are the same (Figure 1).

Table 3 Examples of DNA fingerprints based on the array of 57 SNPs for pineapple genotype identification (showing truncated profiles).
Table 4 List of 24 synonymous groups, including 130 accessions, identified by SNP markers in USDA pineapple collection, Hilo, Hawaii.
Figure 1
figure 1

Somaclonal mutation of clone ‘Cayenne 7898’ showing the difference in dark yellow (Cayenne 7898 4N, HANA 97) and white (Cayenne 7898, HANA96) flesh colors.

Descriptive statistics and clustering analysis of 64 distinctive pineapple accessions

From each of the synonymous groups, only one accession was retained and used for subsequent diversity analysis. Among the 170 genotyped accessions, there were 64 accessions with a unique SNP profile. Descriptive statistics were then computed for the 57 polymorphic SNPs across the 64 pineapple accessions with a unique SNP profile and the result is presented in Table 5. The minor allele frequencies of these 57 SNPs ranged from 0.090 to 0.495 with an average of 0.324. The mean information index was 0.601, ranging from 0.304 to 0.693. The observed heterozygosity ranged from 0.110 to 0.935 with an average of 0.520, whereas the mean expected heterozygosity was 0.414 ranging from 0.164 to 0.500 (Table 5).

Table 5 Minor allele frequency, information index, heterozygosity, and inbreeding coefficient of the 57 SNP loci scored on 64 pineapple accessions.

The unrooted NJ tree grouped the 64 accessions into three main clusters (Figure 2). The clustering patterns presented relationships among accessions based on the different botanical varieties or origins from different geographical regions. The first cluster includes all the accessions of A. comosus vars. ananassoides, bracteatus, and erectifolius, as well as the hybrids derived from these related botanical varieties. Within this cluster, vars. ananassoides bracteatus and erectifolius are clearly separated. This cluster also included several cultivated pineapple clones, such as Bogota, Pina Lisa, and Criolla from Colombia, Bermuda from Barbados, Cayenne Lot 520 from Hawaii, Cabezona from Puerto Rico, and Trinidad from Trinidad. The proximity between these cultivars and the two related botanical varieties indicates that these cultivars are either selected or derived from vars. ananassoides and bracteatus. The two Bolivian accessions (N94-92 Short Fruit#1 and N94-92 Long Fruit#2) were labeled as Ananas species in their passport record data. The cluster result showed that they should be A. comosus var. ananassoides or hybrids derived from A. comosus var. ananassoides.

Figure 2
figure 2

NJ unrooted tree depicting the relationship among 64 pineapple accessions from USDA-ARS, Pacific Basin Tropical Plant Germplasm Resource Center in Hilo, Hawaii. Identification of accessions corresponds to samples listed in Table 1 and Supplemental Table 1.

The second cluster comprised of exclusively A. comosus var.comosus, including several well-known cultivars such as Cayenne Hilo, Mauritius, and Antigua. Since these three cultivars represent the reference horticultural groups of ‘Cayenne’ (Cayenne Hilo) and ‘Queen’ (Mauritius and Antigua), respectively, their grouping here, in one main cluster, demonstrated that the differences used to designate membership to these two horticultural groups are relatively small, in comparison with the other botanical varieties. The third cluster includes 26 cultivated pineapples that formed a large and diverse group. Within this large cluster there are several important pineapple cultivars such as Criolla from Mexico, Montelirio from Guatemala, and Pernambuco and Manzana from Brazil. The majority of the accessions in this cluster seemed mainly cultivated in South America.

Assignment test by STRUCTURE

Population stratification of the 64 accessions, based on ΔK value computed by STRUCTURE HARVESTER, revealed two clusters as the most probable number of K (Figures 3 and 4) and this partitioning was largely compatible with the cluster analysis (Figure 2). All the accessions related to var. ananassoides were assigned to one Bayesian cluster, whereas the cultivated germplasm, as well as vars. bracteatus and erectifolius, were grouped in a different cluster. The F1 hybrid of Wild Brazil × Plot 520 was confirmed by analysis with STRUCTURE. In addition, several accessions were classified as hybrids of the two clusters, such as N94-92, F1 Ananassoides × Plot 435, Wild Brazil × Cayenne Lot 520, and Cb 32 (Figures 3 and 4). The result of assignment by STRUCTURE is largely compatible with the result of clustering analysis (Figure 4). All the accessions assigned by STRUCTURE in the cluster of var. ananassoides or its hybrids were in the first cluster of the NJ tree.

Figure 3
figure 3

Plot of ΔK (filled circles, solid line) calculated as the mean of the second-order rate of change in likelihood of K divided by the standard deviation of the likelihood of K, m|L″(K)|/s[L(K)].

Figure 4
figure 4

Inferred clusters in the pineapple accessions varieties using STRUCTURE in the overall analyzed pineapple accessions. Each vertical line represents one individual multilocus genotype. Individuals with multiple colors have admixed genotypes from multiple clusters. Each color represents the most likely ancestry of the cluster from which the genotype or partial genotype was derived. Clusters of individuals are represented by colors.

Parentage analysis

Among the 52 cultivars and hybrids derived from related botanical varieties, paternal or maternal parents were assigned (>80% confidence level) to 14 accessions (Table 6). A. comosus var. ananassoides was responsible for parentage of three accessions including Bogota, Bermuda, and Pina Lisa, whereas A. comosus var. bracteatus was assigned to parentage of 10 accessions. No parentage was assigned to A. comosus var. erectifolius. The result of parent–offspring assignment is largely compatible with the cluster analysis (Figure 2). Accessions assigned as offspring from the same parent tended to be grouped together in the NJ tree (Figure 2). For example, CB 17 was found to be the likely progenitor for Mauritius, Phu Qui, and Congo, all of which grouped together in the same subcluster in group 3 (Figure 2).

Table 6 Likelihood assignment of parentage of pineapple (A. comosus var. comosus) accessions based on 57 SNP markers with LOD scores above 80% probability.

Discussion

Despite substantial progress in genomics research on pineapple, advanced molecular tools to support germplasm management are not available. Developing SNP markers from transcriptome sequences has been considered an efficient strategy for non-model species. In the present study, we validated 96 SNP markers based on the transcriptome sequences of pineapple at various development stages and used them to genotype a diverse panel of cultivated and wild germplasm. We obtained a success rate of approximately 60% for marker validation, which demonstrated that this approach can serve as a shortcut for SNP development. As shown in the present study, even a small set of SNP markers can significantly improve accuracy and efficiency in germplasm management.

Pineapple cultivar identification

Reliable identification of pineapple cultivars is invaluable for germplasm conservation and cultivar protection. In the present study, it has been demonstrated that the set of 57 SNP markers was effective for the assessment of genetic identity of pineapple germplasm. Results from multiple clones of the same cultivar showed 100% concordance, demonstrating that the nanofluidic system is a reliable platform for generating pineapple DNA fingerprints with high accuracy. The present result revealed a high rate of genetic redundancy in this pineapple collection. Some of the identified duplicates are well-documented synonymous cultivars. For example, the Cayenne cultivars are known to be derived from a few ancestral pineapple plants that originated from Cayenne, French Guiana.10,48 But majority of the clones or synonymous groups have been less known to the pineapple community, such as Pernambuco vs. Sugar Loaf, Spanish Samoa vs. Natal, and Ruby vs. Los Banos. Identification of these clone groups will significantly facilitate the efficient exchange, conservation, and use of pineapple germplasm.

However, caution is needed for the interpretation of genetic redundancy in pineapple. It is well known that somatic mutation is common in pineapple. Many phenotypic traits such as spiny leaves, fruit flesh color, acidity, and sugar content of fruit have been well documented. These somatic mutations are the major source of variation exploited for the selection of new cultivars. For example, the spiny or smooth leaf margins, caused by a single gene,10 are the signature character for the cultivar group Smooth Cayenne. Such a mutation is difficult to detect when a small set of molecular markers are applied. Similar problems were found in fingerprinting projects dealing with other vegetative propagated crops such as bananas (Musa spp., 2014), 49 bread fruit (2015),50 and apple (Malus spp., 2012).51 More comprehensive genomic approaches, such as next-generation sequencing, would be needed to detect which genes or alleles had been changed, thereby causing the phenotypic variation. For this reason, the reduction of identified duplicates in pineapple germplasm genebank needs to be considered on a case-by-case basis. Characterization of phenotypic traits among the synonymous group members is still essential to complement DNA fingerprinting for genotype identification.

Classification of pineapple germplasm

A. comosus is a mostly self-incompatible diploid with 2n=2x=50 chromosomes.52,53 This species includes five botanical varieties of A. comosus: vars. comosus, ananassoides, parguazensis, erectifolius, and bracteatus, based on the revised classification of Coppens d’Eeckenbrugge and Leal.7 The present results show that out of the 170 Ananas accessions maintained in the USDA pineapple collection, there are only 64 distinctive genotypes. Clustering analysis and model-based stratification both showed that A. comosus var. ananassoides differs from A. comosus var. bracteatus, thus supporting the revised taxonomy system that classified A. comosus var. ananassoides and var. bracteatus as two different botanical varieties.7 However, accessions of var. erectifolius were found to have high similarity and grouping closely together with the accessions of var. bracteatus. This result differs with a previous report based on isozyme variation,15 which showed that A. comosus var. erectifolius did not have a distinctive isozyme profile, in comparison with the rest of the A. comosus var. comosus cultivars. Nonetheless, the present study only used two accessions of A. comosus var. erectifolius, which may be a bias in terms of the sample representation. Additional samples of A. comosus var. erectifolius from other genebanks need to be examined and a larger number of SNP markers need to be analyzed to clarify if the classification of A. comosus var. erectifolius should be revised.

The second observation is that several cultivated pineapple accessions (Bogota, Pina Lisa, Bermuda, Cayenne Lot 520, Cabezona, and Trinidad) were grouped together with A. comosus var. ananassoides or A. comosus var. bracteatus, instead of with the rest of the A. comosus var. comosus accessions (Figures 2 and 3). This result indicates that the current system that classifies all cultivated pineapple into a single botanical variety (A. comosus var. comosus) may be questionable. It would be appropriate to consider cultivated pineapple as a complex of different botanical varieties, with possible significant gene flow among them.

The third observation is about the validity of the horticultural classification of pineapple germplasm. Pineapple cultivars are classified into several horticultural groups. The commonly known groups include ‘Abacaxi’, ‘Cayenne’, ‘Maipure’, or ‘Perolera’, ‘Queen’, and ‘Spanish’.10,54,55 Despite these horticultural groups having been adopted by many users, little investigation has been done to show a genetic basis to reinforce this categorization. Kato et al. examined the efficacy of the horticultural groups and reported that the classifications of ‘Cayenne’, ‘Spanish’ and ‘Queen’ were not well supported by AFLP analysis.20 Shoda et al. analyzed 31 pineapple accessions using SSR markers.22 Their results also showed disagreement between the horticultural type and the results of the SSR analysis. The current study showed that the ‘Cayenne’ cultivars have a distinguishable genetic identity, and most of the affiliated accessions were grouped in a single cluster. However, accessions in the other groups did not appear well clustered. For example, cultivars Mauritius and Antigua are two well-known reference cultivars in the ‘Queen’ group, but in the NJ tree (Figure 2) they were separated in different subclusters, where cv. Antigua showed higher proximity with the ‘Cayenne’ group than with Mauritius. Similar discordance was found between cultivars of the ‘Spanish’ group (Figure 2). Therefore, our results support the previous conclusions of Kato et al.20 and Shoda et al.22 that the classification of pineapple cultivars into horticultural groups lacks consistency in terms of their genetic bases. Revision seems needed on this classification with the support of new evidence generated by DNA markers.

Putative progenitors of pineapple

Parentage analysis showed that both vars. bracteatus and ananassoides can be progenitors of pineapple cultivars (Table 6). This result is in agreement with the fact that both var. bracteatus and var. ananassoides can intercross successfully with var. comosus to produce fertile offspring.7,56 Coppens d’Eeckenbrugge and Leal hypothesized that var. ananassoides is the likely progenitor of cultivated pineapple, and it is likely that domestication happened in the Guiana shield.7 One strong piece of evidence supporting this hypothesis is that all four chloroplast haplotypes that have been identified in cultivated materials are present in the wild var. ananassoides.7 On the other hand, var. bracteatus was not considered as a progenitor in this hypothesis, mainly because var. bracteatus appeared to be a homogeneous variety with narrow genetic diversity, which is an unlikely basis for diverse domesticated cultigens of pineapple.7 The present result, however, shows that 11 pineapple cultivars (Canterra, Papuri Vaupes Colombia, CB 30, Pina de Castilla, Rondon, Congo, Phu Qui, Mauritius, Cheese pine), which are dispersed across different clusters as shown in the NJ tree, Figure 2), could have their parentage (either male or female) traced back to var. bracteatus (Table 6). Ananas comosus var. bracteatus is native to Brazil, Bolivia, Argentina, Paraguay, and Ecuador but not to the Guiana shield. The present result thus indicates the possibility that pineapple could have been domesticated at multiple sites, involving both var. ananassoides and var. bracteatus. The Parana-Paraguay river drainage area could be one of the domestication sites, since both var. bracteatus and var. ananassoides are indigenously distributed in this area.4,5 Geographically disparate origins of crop domestication are not uncommon in the Americas, as in the case of common bean (Phaseolus vulgaris), chili pepper (Capsicum spp.), potato (Solanum spp.), and cacao (T. cacao), as reviewed by Clement et al.57

In conclusion, we conducted a study to develop a set of SNP markers for pineapple and employed them for fingerprinting the USDA’s pineapple collection, using a nanofluidic array. This approach enabled us to generate high-quality SNP profiles for the purpose of pineapple cultivar identification. This is a highly useful tool for genebank management, which will also lead to more efficient crop improvement and, furthermore, has the potential to protect intellectual property rights of breeders. Our result also generated significant insight regarding the origin and domestication of pineapple. Efforts to sequence multiple cultivars from the same synonymous groups with somaclonal mutations are underway, in order to gain a comprehensive understanding about the genetic basis for mutation-based changes in important agronomic traits. This information will be highly useful for verification of pineapple cultivars and will improve the efficiency of pineapple genebank operation. The high rate of genetic redundancy detected in this collection, also suggests the potential impact of applying this technology on other tropical perennial crops.