Revealing the complex genetic structure of cultivated amaryllis (Hippeastrum hybridum) using transcriptome-derived microsatellite markers

Although amaryllis (Hippeastrum hybridum) plants are commonly used in physiological and ecological research, the extent of their genomic and genetic resources remains limited. The development of molecular markers is therefore of great importance to accelerate genetic improvements in Hippeastrum species. In this study, a total of 269 unique genes were defined that might regulate the flower spathe development of amaryllis. In addition, 2000 simple sequence repeats (SSRs) were detected based on 171,462 de novo assembled unigenes from transcriptome data, and 66,4091 single nucleotide polymorphisms (SNPs) were also detected as putative molecular markers. Twenty-one SSR markers were screened to evaluate the genetic diversity and population structure of 104 amaryllis accessions. A total of 98 SSR loci were amplified for all accessions. The results reveal that Nei’s gene diversity (H) values of these markers ranged between 0.055 and 0.394, whereas the average values of Shannon’s Information index (I) ranged between 0.172 and 0.567. Genetic tree analysis further demonstrates that all accessions can be grouped into three main clusters, which can be further divided into two subgroups. STRUCTURE-based analysis revealed that the highest ΔK values were observed when K = 5, K = 6, K = 7 and K = 8. The results of this study enable large-scale transcriptomics and classification of Hippeastrum genetic polymorphisms and will be useful in the future for resource conservation and production.

The genus Hippeastrum in the family Amaryllidaceae includes 75 species from South America, except for one from the West African taxa 1 . Hippeastrum, which is also commonly known as the giant amaryllis, is a bulbous perennial plant that is characterized by a ploidy level that ranges between diploid and octoploid 2 . These plants typically produce between three and six glossy strap-like leaves, which are approximately 600 mm long and 50 mm wide 1,3 ; flowers appear simultaneously with the leaves in this genus, and one plant can produce one or two inflorescences. Each of these clusters contains between two and five (mostly four) large trumpet-shaped flowers, which have a scape that is typically hollow and can grow up to 550 mm in size 3,4 . Flowers vary from pure white to brilliant red. The flowers are zygomorphic and approximately 200 mm in diameter, exhibiting numerous variations in colour and striping. Amaryllis seeds are flat and have black, papery wings. The plant is well adapted to a warm and humid environment. In some parts of the world, such as Bangladesh, agro-ecological conditions are extremely

Results
Illumina paired-end sequencing and de novo assembly of the amaryllis transcriptome. Sequencing the amaryllis cv. Blossom Peacock complementary DNA (cDNA) library using the Illumina HiSeq2000 platform generated a total of 28,974,793 raw reads comprising 5.85 Gbp of nucleotides. Subsequent to processing via strict filtration to trim adaptors and remove low-quality sequences, a total of 27,273,776 high-quality clean reads remained. The de novo assembly of 380,731 transcripts ultimately yielded 171,462 unigenes with a total length of 84,359,807 bp (Table 1) (mean = 492 bases, maximum = 8,272 bases, minimum = 201 bases) and an N50 of 610 bases using Trinity software 25 . The majority (74.69%; 128,058) of these unigenes had read lengths between 200 and 500 bp. In addition, 14.66% (25,138) of these unigenes had read lengths between 500 and 1,000 bp, and 10.65% (18,266) had read lengths greater than 1,000 bp.

Sequence annotation.
To further elucidate the potential functions of assembled unigenes, all 171,462 unigenes were subjected to blast searches against diversity databases (Table S1). Thus, transcriptome sequences were initially searched against the National Center for Biotechnology Information (NCBI) non-redundant nucleotide and Uniprot Swiss-Prot protein databases using BLASTX. The search was executed with a cut-off E-value of 10 −5 and a significant similarity greater than 30%. The results of this comparison revealed that 47 Table 1).
To identify the biological pathways represented by the amaryllis unigenes assembled in this study, we compared our data with those in the KEGG database. A total of 18,048 unigenes could be assigned to 301 pathways from this database that consist of seven categories: 'cellular processes' , 'drug development' , 'environmental information processing' , 'genetic information processing' , 'human diseases' , 'metabolism' , and 'organismal systems' . The KEGG group containing the largest number of amaryllis unigenes was 'carbohydrate metabolism' (748) within the 'metabolism' category followed by 'fold, sort, and degradation' (2,073) and 'translation' (1,705) within the 'genetic information processing' category as well as the 'infectious diseases' (1,737) group within the 'human diseases' category ( Fig. 2C and Table 1).

Search for flower developmental genes.
We searched the presence of 269 candidate genes known to be involved in flower determination and development. In addition, we found that a portion of unigenes was assigned to the anthocyanin biosynthetic pathway (125), carotenoid biosynthesis pathway (30), specification of floral organ identity (13), photoperiod pathway (32), vernalization pathway (8), gibberellic acid pathway (9), ethylene biosynthesis pathway (22) and other genes of flower development (30). All of these unigenes are listed in Table S2. All of these unigenes may be involved in flower-related biological processes, such as flower development and formation. In amaryllis, the identification and analysis of these key genes will lay the foundation for understanding the potential molecular genetic mechanisms that control different aspects of amaryllis flower development in the future.
Genetic diversity of SSR loci. Among a total of 6,599 SSR loci, 6,460 loci with appropriate sequences were reserved for marker design, and a total of 335 SSR loci were randomly selected for the synthesis of PCR primer pairs containing 129 di-, 142 tri-, 64 tetranucleotide repeats. In an initial screen of 335 SSR primers (Table S4)       FP257). The mean Nei's gene diversity (H) of these markers was 0.264, ranging between 0.055 and 0.394. The average Shannon's Information index (I) value was 0.41, ranging between 0.172 and 0.567. FP083 and FP249 markers exhibited the lowest and highest genetic diversity, respectively ( Table 3).
The genetic relationships and population structure of cultivated amaryllis. We constructed a neighbour-joining (N-J) tree using the software MEGA to hypothesize the genetic relationships amongst 104 Hippeastrum accessions. The resultant dendrogram demonstrates that all amaryllis cultivars and hybrids can be obviously separated from one another. These cultivars and hybrids are denoted cluster I (35 accessions), cluster II (31 accessions), and cluster III (38 accessions) (Fig. 4a). Each of these clusters is further separated on the N-J tree into two sub-clusters. For example, sub-cluster I within cluster I mainly comprises the 18 accessions between 'Orange' and 'Flamingo' , 14 of which are from the Netherlands and four are from South Africa. Sub-cluster II primarily comprises the 17 Hippeastrum accessions between 'Double king' and 'Mont Blanc' , which is composed of   South Africa. Finally, cluster III contains 38 accessions, 29 of which are from the Netherlands and nine are from South Africa. It is also noteworthy that the accessions within each cluster can also be described on the basis of their petal number (single or double)  The genetic structure of the 21 SSR markers determined for these 104 accessions was examined using the statistical model in the software STRUCTURE. This analysis yielded K values that range between 1 and 15, and the most likely number of groups was determined by calculating Delta K (ΔK). Thus, the highest ΔK values were observed when K = 5 (i.e., ΔK = 51.92), K = 6 (i.e., ΔK = 41.17), K = 7 (i.e., ΔK = 44.54), and K = 8 (i.e., ΔK = 33.35). These results suggest that all accessions can be separately classified into five, six, seven, and eight sub-groups, respectively (Fig. 4). The genetic structure of each accession is illustrated using a bar plot to enable a simple comparison with the N-J tree (Fig. 4).

Discussion
Although amaryllis represent an economically important plant group that has become increasingly influential in research on ornamental flowers, a general lack of information about the SSR markers within their transcriptome has hampered genetic research. The results presented here are consistent with previous studies that have developed limited genetic resources for other members of Amaryllidaceae [21][22][23] and demonstrate that transcriptome analysis and marker development is possible for this flowering plant.
Developing genetic resources for amaryllis through transcriptomics. Transcriptome sequencing studies are highly valuable for the discovery of molecular markers as well as characterization and variant analysis as components of the identification of novel genes 24 and provide the basis of our knowledge of genome intricacy. Transcriptome diversity is significant in the case of flowering plants, such as amaryllis, that have very large genomes and therefore represent a sequencing challenge. A total of 27,273,776 clean reads obtained from the Illumina HiSeq2000 platform were used to assemble the transcriptome of H. hybridum cv. Blossom Peacock in this study. Of note, the Roche 454 platform was initially favoured for transcriptome assessment in this research given its larger read length, the possibility to equalize for deficiencies in the reference genome, and the fact that it was used successfully for L. aurea sequencing in a previous study 21 . Although Illumina platform sequencing is particularly limited in some aspects, recent procedural developments in assembly have enabled high-quality de novo assembly for a number of species, including L. sprengeri 23 (Amaryllidaceae). This study further illustrates successful de novo assembly in amaryllis combined with analysis of diversity data and standards on the basis of 380,731 transcripts representing 171,462 unigenes from 28,974,793 raw reads (Table 1). This result is significant because the average length of unigenes is generally assumed to be a sign of high-quality assembly 26 . One obvious feature of our H. hybridum cv. Blossom Peacock assembly is the unigene average (610 bp), which is greater than the average length previously recovered for L. sprengeri 23 (385 bp) using the Illumina HiSeq2000 platform. Comparisons with Roche 454 platform sequencing of L. aurea (329 bp) 21 further confirms the effectiveness of our H. hybridum cv. Blossom Peacock transcriptome assembly. One possible explanation for the longer mean length recovered in amaryllis might be the broader sequencing coverage provided by the Illumina platform. In addition, sequence comparisons, annotation, and marker validation may also be important factors that contribute to the quality of de novo assembly.
Results indicate that the amaryllis sequences assembled in this study are diverse compared with those from existing similar data and that 47,359 (27.62%) and 27,089 (15.80%) unigenes exhibit significant homology versus the Nr and Swiss-Prot databases, respectively. These observations suggest that sequence contiguity remains consistent over parts of the annotated transcriptome; thus, unigenes that do not match with existing databases might either be indicative of novel amaryllis-specific genes or correspond with non-coding regions, pseudogenes, or short transcript lengths. Similar BLAST-based methods have been utilized in previous studies. For example, 45,052 (45.9%) L. sprengeri unigenes and 66,197 (46.91%) L. aurea unigenes were matched significantly with known counterparts in public databases. This finding suggests that the amaryllis sequence overlaps considerably with previously collected data.
Several methods were utilized in this study to analyse the biological function of assembled transcripts, including GO, COG, and KEGG. The GO analysis results revealed that 20.50% of unigenes proceeded through this annotation and that it was a beneficial approach for determining the diversity of gene functions. To gain further insights on specific metabolic and cellular processes controlled by gene functions in the context of genetically and biologically complex behaviours 27 , we mapped 8.25% of our annotated unigenes against the COG database and 10.53% to KEGG terms. These results are similar to those previously reported for L. sprengeri and L. aurea and indicate that the transcripts generated in this study for H. hybridum cv. Blossom Peacock are high quality and might be applicable to future studies that address amaryllis genetic cloning, molecular heredity, and transgenesis.
In addition, single-gene annotation and predictive pathways have accelerated the discovery of key genes associated with flower development and function of amaryllis flowers. All in all, we obtained 269 genes related with eight pathways, including anthocyanin biosynthetic pathway (125), carotenoid biosynthesis pathway (30), specification of floral organ identity (13), photoperiod pathway (32), vernalization pathway (8), gibberellic acid pathway (9), ethylene biosynthesis pathway (22) and other genes of flower development (30). These special unigenes indicated that relatively accurate and high mean genome databases can be generated through de novo transcriptome analysis of non-model plant species. Detection and validation of amaryllis SNP and SSR markers. The use of SSRs and microsatellite markers have a number of advantages compared with other systems and are thus of considerable importance to our understanding of genetic diversity, map construction, comparative genomics, and molecular breeding 28 . Identification of a number of SSRs was therefore undertaken as part of this study to enrich the known genomic resources of amaryllis. The mean frequency detected in H. hybridum cv. Blossom Peacock corresponds with one SSR locus per 4.2 kb, which is greater than that previously reported for coloured calla lily (Zantedeschia hybrid) 29 (i.e., one SSR locus per 7.27 kb) but lower than that for Zantedeschia rehmannii 30 (i.e., one SSR locus per 4.1 kb). These differences may be due to the use of different standard repeat units and length thresholds 31 , the number of data libraries searched, and SSR identification instruments 32 . We also determined that 51.8% of our total SSRs are trinucleotide repeats followed by dinucleotide (45.5%), and tetranucleotide repeats (2.3%) as well as pentanucleotide and hexanucleotide repeats, which collectively comprise 0.4% of the motifs. One important characteristic of amaryllis appears to be that the frequency of motif repeats simultaneously decreases in contrast to that of Z. rehmannii 30 . Thus, compared with some other plants 33 , our results suggest that trinucleotide repeats are the most common genetic motif of this type in amaryllis. Within these repeats, AAG/CTT is most abundant (31.2%). Further, among a total of 335 SSRs, 154 primer pairs successfully amplified polymorphic bands in Hippeastrum accessions. The proportion is not very high. This low amplification and polymorphism rate in 17 accessions may be due to the presence of introns in primers, assembly errors, and the high heterozygosity of the amaryllis genome. We selected a total of 21 primer pairs, almost all of which were able to provide single, clear amplification products in all 104 varieties. This result suggests that the SSRs sequenced in this study are sufficient to determine the relationships amongst the accessions used in this study despite the fact that the short and robust nature of these repeats in amaryllis has previously prevented a clear evaluation of genetic diversity.
Single nucleotide polymorphisms (SNP) have recently become a more popular marker for high-density genetic mapping, association mapping and population genetic structure studies 34 . To understand the SNPs of H. hybridum cv. Blossom Peacock, we detected SNPs of H. hybridum cv. Blossom Peacock. The mean frequency detected in this cultivar corresponds with one SNP locus per 7.87 kb, which is greater than one SSR locus in this report. On the other hand, this finding may indicate the polyploidy or genetic complexity of the Hippeastrum variety.
We resolved this issue in this study by mining novel microsatellite loci that will expedite a more comprehensive assessment of amaryllis in future research.
Assessing the genetic diversity of amaryllis accessions. As molecular markers, SSRs are of great importance to the assessment of genetic diversity. In total, 21 primer pairs were selected for further PCR validation in this study. Our results clearly elucidate the genetic diversity among amaryllis cultivars and hybrids and reveal the effectiveness of our mined markers because significant H and I values were obtained and amaryllis accessions could be clearly distinguished. The number of polymorphic loci (94%) across all 21 primers detected in this study is increased compared with previous studies focused on the RAPD 13 (72.6%) and ISSR 15 (92.4%) of amaryllis accessions. However, both H and I values might be useful to distinguish accessions to a certain extent (Table 3). Average H and I values in these cases were 0.26 and 0.41, respectively. These results however may not have reached the desired level for significant comparisons with previous ISSR 15 and RAPD-based 13,14 diversity studies in amaryllis in part given the higher number of samples. In addition, SSR loci also occurred at low frequencies in some markers, including FP083 (0.055, 0.121) and FP105 (0.124, 0.172), which is consistent with the results of previous studies 35 . One reason for a lower number of alleles is possibly the fact that these accessions belong to different species, which might lead to an increased number revealed by a specific SSR marker 36,37 . In addition, species breeding behaviour, collection genetic diversity and size, the sensitivity of the genotyping method and the genomic locations of markers could all influence the data included in this research.
Cluster analysis and genetic structure within Hippeastrum spp. accessions. Some previous studies have regarded the number of amaryllis flowers (i.e., single or double) and colour as key reference standards to perform clustering and structural analysis given that a significant amount of breeding is performed based on these traits 38,39 . McCann presented the first discussion of an amaryllis double flower in 1937 on the basis of a wild type Amaryllis puniceum specimen collected from Cuba. Six known petal sources at the possible origin of accumulations, bracts, male and female stamen, taiwan pavilion, repetition, and inflorescence are known for most plants 40 . In a previous study, hereditary observations revealed that an increase in the number of amaryllis petals is derived from stamens in male and females and that the basal trait is dominant 40 . Given that previous studies have also demonstrated that the ABC and four factor models both determine the development of floral organs 41 within the same growth environment, both flower number and colour are most likely genetically regulated and can therefore be reliably reconstructed using clustering based on these traits.
However, the N-J tree and structural plot generated in this study (Fig. 4) demonstrate that the 104 accessions we considered cannot be fully clustered according to the number of flowers or their colour. Exceptions include the 'Lemon sorbet' , 'Ballerina' and 'Chico' varieties. Of these varieties, the former possesses green single petals, whereas 'Ballerina' (red flowers) and 'Chico' (pink flowers) both have double petals. These exceptions are important because they illustrate the comprehensive nature of amaryllis population structure.
We believe that there are two main explanations for the complex population structure of amaryllis. The first explanation involves the abundance, wide distributional range, and long history of species breeding within this plant group. Amaryllis are native to the mountainous areas of South America, including Brazil, Peru, Argentina, and Bolivia 42 , and this ornamental plant has been bred in the UK since the 1690s. Although the first recorded documentation of Hippeastrum hybrids was in 1799 in the UK as 'Johnsonii' (H. vittatum X H. regime), the introduction of these central and South American flower crops into Europe occurred later during the 19 th century. Thus, after numerous breeding experiments, Hippeastrum was divided into four key hybrid groups: Reginae, Leopoldii, Vittatum and Reticulatum 9 . Amaryllis gradually became a very popular flower in the West with greater than 300 cultivars 5 . As the cultivation of amaryllis and its hybrids is now observed in many countries around the world, including the Netherlands, South Africa, Japan, Brazil, and the USA, the original founder species has lost genetic relationships amongst many cultivars and hybrids due to long-term artificial cultivation and breeding. An additional reason for this confusion is the existence of a polyploid variety of amaryllis, H. hybridum, a bulbous perennial with ploidy level that ranges between diploid and octoploid 2 . Examples include H. aulicum, H. machupichense, H. psittacinum, and H. solandrifoliu (2n = 22); H. forgeti (2n = 2x + 1 = 23) and H. argentinum (2n = 33); H. reginae (2n = 44); and H. rutilum (n = 55) 43 . As a direct result of this diversity, amaryllis has gradually become an increasingly popular flower globally. This polyploidy has resulted in a large floral organ and a high yield, which have both attracted researchers. Thus, a big flower has always been a clear goal of breeding experiments 38 . These characteristics also indicate that the complex structure of amaryllis relationships have proved difficult to untangle. The tree-based clustering and structural analysis presented in this study do not reveal any obvious associations with the geographic origin of accessions, but this finding might also be related to the fact that accessions have only been derived from the Netherlands and South Africa. Further studies involving larger samples derived from extended geographical regions are required to develop more generalized conclusions regarding the divergence and population structure of Hippeastrum accessions.
The aim of this study was to develop and assign SSRs to elucidate the genetic diversity of 104 Hippeastrum accessions. We confirmed the potential of transcriptome sequencing in the case of species where genomic information is lacking such as Hippeastrum. SSR resources have also been shown to be useful for genetic detection and localization of this species and may help improve the plant's floral quality.

Materials and Methods
Plant material and DNA extraction. The 104 amaryllis cultivar and hybrid specimens used in this study (Table S5) (85 from the Netherlands and 19 from South Africa) were housed in an experimental greenhouse at the Beijing Academy of Agriculture and Forestry Sciences, Beijing, China. All specimens were planted in fertile land containing a mixture of sand and soil (1:1, v/v) and were subjected to a completely randomized range of temperatures between 18 and 22 °C from April to November 2014. All accessions were grown under similar conditions and were irrigated and fertilized monthly. Fresh young leaves were collected in the spring, immediately frozen in liquid nitrogen and stored at −80 °C until they were used for DNA extraction. Total DNA was separately extracted from each specimen using a DNeasy Plant Mini kit according to the manufacturer's instructions (Zexing Biotech, Beijing, China). The quality and quantity of genomic DNA was assessed via resolution in a 1% (w/v) agarose gel using a Qubit ® 2.0 Fluorometer. RNA isolation, library preparation, Illumina sequencing and de novo transcriptome assembly. Sample flowers were also frozen directly in liquid nitrogen and stored at −80 °C for RNA extraction. Total RNA from each sample was extracted using an RNeasy Plant Mini Kit (Zexing Biotech, Beijing, China) according to the manufacturer's instructions. The quality and quantity of RNA was assessed using a NanoDropND 2000C spectrophotometer (Thermo Scientific, USA) and Bioanalyzer (Agilent Technologies, USA). Poly (A)+ mRNA was enriched with magnetic beads cohering Oligo (DT) 25 , and equal quantities of poly (A)+ mRNA from each sample were mixed and used as modes to construct cDNA libraries. These libraries were synthesized using the standard Illumina pipeline (Illumina, San Diego, CA, USA) according to the manufacturer's instructions at the Yuanquan Yike Biotechnology Company Limited (Beijing, China). The quality of raw reads was assessed using the software Trinity 25 before de novo assembly was performed using sequence adapters following the removal of low quality reads for subsequent analyses. All Illumina sequencing reads for H. hybridum cv. Blossom Peacock were submitted to the NCBI Sequence Read Archive under the accession number SAMN05149972.
Functional annotation and classification of unigenes. Functional annotation and assembled unigene classification followed the method previously described by Meyer et al. 44 . Assembled unigenes were searched to assess their similarity versus the NCBI Nr and Swiss-Prot protein databases using BLASTX 44 with the threshold E value set to greater than e-10 -5 . BLAST results were then imported into Blast2GO (Version 2.5.0), and each sequence was assessed versus the GO database using default parameter settings. All unigene sequences were also aligned to the COG database to predict and classify their possible functions. Biological pathway information indicative of molecular interactions and reaction networks was annotated based on the KEGG pathway.

Search for genes of interest.
We searched all genes that were involved in anthocyanin biosynthesis pathway, carotenoid biosynthesis pathway, specification of floral organ identity, photoperiod pathway, vernalization pathway, gibberellic acid pathway, ethylene biosynthesis pathway and other genes of flower development (Table S2) from different species and used them to trace our annotated contigs by similarity to identify candidate genes that correlate with flower organ formation and double flower formation. These final results were used for further bioinformatics analysis 45 .
The frequency distribution of SNPs. All unigenes were retained to identify potential SNPs using the MISA tool (MIcroSAtellite) 46 . SNPs were detection using the CLC Genomics Workbench 6. SSR marker verification in amaryllis accessions. All unigenes were retained to identify potential SSRs using the MISA tool (MIcroSAtellite), and a series of searches were performed to identify di-, tri-, tetra-, penta-, and hexanucleotide motifs with a minimum number of six, five, four, four, and four 47 , respectively. The software Primer Premier 5.0 (PREMIER Biosoft International, Palo Alto, CA) was then used to manually design 335 PCR primer pairs to select SSRs with greater than 20 motif loci, which were then synthesized at Sangon Biotech (Shanghai, China) and used to validate polymorphisms in 17 amaryllis accessions. A total of 154 SSR primers yielded significant and reproducible products from this step before 21 SSR primer pairs were randomly selected to assess the diversity of 104 amaryllis accessions (Table S5).
We performed PCR with each primer on DNA from all 104 accessions to amplify microsatellite-containing regions. Thus, PCR reactions (12.5 μl volume) were conducted using a GeneAmp PCR System 9700 (Applied Biosystems) with each reaction containing 4 μl of genomic DNA (5 ng/μl), 0.4 μl of each primer, 0.25 μl of high purity dNTPs, 0.2 μl of Taq10 DNA polymerase, 1.25 μl of 10 × Taq10 buffer I or II, and 6 μl of ddH 2 O. PCRs were conducted using the touchdown cycling program. Thus, an initial denaturation at 94 °C for five minutes was followed by 11 touchdown cycles between 60 °C and 50 °C, 45 seconds at 94 °C, annealing for 40 seconds at 60 °C (the annealing temperature for each cycle was reduced by 1 °C per cycle), and extension for one minute at 72 °C. This program was accompanied by 15 denaturation cycles at 94 °C for 45 seconds followed by annealing at 50 °C for 40 seconds, elongation at 72 °C for one minute, and ten minutes of extension at 72 °C before a final infinite hold at 4 °C. We performed electrophoresis on an 8% non-denaturing polyacrylamide gel (PAGE) with a Tris/ borate/EDTA (TBE) buffer running at 150 V for one hour to separate PCR products before gels were silver stained as described in previous work on the coloured calla lily 29,30 . SSR markers amplified at sizes between 100 bp and 400 bp were converted into '0' and '1' codes denoting 'absence' and 'presence' , respectively. Data analysis of genetic diversity and population structure. Values for N, NP, H, and I were calculated for each SSR locus using the software POPGENE 3.2 48 , and a N-J dendrogram comprising the 104 amaryllis accessions was generated using the software packages PowerMarker version 3.25 and MEGA 5 29 . The assignment of 104 SSR accessions to different clusters was then performed based on 21 genetic SSRs using the model-based software STRUCTURE v2.3.3 49 . This application analyses SSR marker data to determine the attribution of clusters. Thus, for each K value (ranging between 1 and 15), at least ten independent admixture ancestry model runs were accomplished employing a 200,000-iteration burn-in period followed by 200,000 subsequent iterations. The best value for K clusters was then calculated following the method of Gilbert et al. 50 , and the mean likelihood value, L (K), across all runs was estimated for each K-value. We used Delta K as our model choice criterion to indicate a clear peak for the most likely K-value, which was calculated using the formula Delta K = m(|L(K)|)/ s[L(K)] 51 . We regarded the level of the Delta K value to be tantamount to a test of our structural analysis.