Evaluation on the phenotypic diversity of Calamansi (Citrus microcarpa) germplasm in Hainan island

Calamansi or Philippine lime (Citrofortunella macrocarpa) is an important crop for local economic in Hainan Island. There is no study about Calamansi germplasm evaluation and cultivar development. In this study, Calamansi data were collected from 151 of Calamansi seedling trees, and 37 phenotypic traits were analyzed to investigate their genetic diversities. The cluster analysis and principal component analysis were conducted aiming to provide a theoretical basis for the Calamansi genetic improvement. The results of the diversity analysis revealed: (1) the diversity indexes for qualitative traits were ranged from 0.46–1.39, and the traits with the highest genetic diversity level were fruit shaped and pulp colored (H′ > 1.20); and the diversity indexes for quantitative traits ranged from 0.67–2.10, with the exception of a lower in fruit juice rate (1.08) and lower in number of petals (0.67). (2) The clustering analysis of phenotypic traits have arranged the samples into 4 categories: the first group characterized by fewer flesh Segment number per fruit (SNF) and more Oil cell number (OCN); the second group had 7 samples, all characterized with larger Crown breadth (CB), higher Yield per tree (YPT), the lager leaf, the higher Ascorbic acid (AA), and less Seed number per fruit (SNPF); the third group had 25 samples characterized by smaller Tree foot diameter (TFD),smaller Fruit shape index (FSI) and higher Total soluble solids (TSS) contain; the fourth group had 87 samples, they were characterized by shorter Petiole length (PEL), larger fruit, higher Juice ratio (JR), multiple Stamen number (SN) and longer Pistil length (PIL). (3) The principal component analysis showed the values of the first 9 major components characteristic vectors were all greater than 3, the cumulative contribution rate reach 72.20%, including the traits of single fruit weight, fruit diameter, tree height, tree canopy width etc. Finally, based on the comprehensive main component value of all samples, the Calamansi individuals with higher testing scores were selected for further observation. This study concludes that Calamansi seedling populations in the Hainan Island holds great genetic diversity in varies traits, and can be useful for the Calamansi variety improvements.

Calamansi (Citrus microcarpa) or Philippine lime, is an important local economic crop in Hainan China. It originated in Southeast Asia, mainly grow in Southeast Asia and tropical regions of China, and it had a long history of cultivation in Hainan Island. Calamansi fruit is rich in vitamins C, aromatic oils, carotenoids and other natural substances which have lots of health benefits for human, such as beneficial effects for human eyes, good for treating cough, asthma, high blood pressure and preventing arteriosclerosis etc [1][2][3] . Calamansi fruit had a fine texture and sour taste. Calamansi juice is widely loved as a delicious fresh condiment. However, the commercially cultivated Calamansi were mostly seedling trees, and their genetic diversity and improvement had not been studied, which causing a series of problems such as no stable commodity supply period and uneven fruit quality. Hainan island is the main growing area of Calamansi in China. The investigations and evaluations of the germplasm of Calamansi in Hainan Island hold great significance for Calamansi genetic improvement with fruit quality.
Phenotypic traits were intuitive manifestations of the quality of germplasm resources and an important indicator of genetic improvement. The diversity of phenotypic traits was the comprehensive performance of the Results Genetic diversity. A total of 8,511,230 SNP loci were obtained and based on these SNP loci the phylogenetic tree was constructed by using Phylip software. The phylogenetic tree showed that those 100 individuals can be divided into 5 groups. Among them, most of them were related to each other in different level, except 2 individuals (L-N6R62C6 and L-N3R19C10) presented a very simple relationship to their common ancestor (Fig. 1). The result indicated that the current existing Calamansi populations in Hainan Island have quite high levels the genetic diversity, despite of high level of polyembryony nature of the Calamansi seeds.
Phenotypic traits diversity. Quality traits. The names, abbreviations and units of all traits are shown in Table 1, and the detailed scoring criteria are shown in Supplementary Table 1. The 8 quality traits were divided into 39 levels ( Table 2), 34 of which have the frequency ranged from 0.66 to 84.11% of samples distributions. There were 5 traits have weak frequency of distribution, they were TGV (Tree growth vigor), OFS (Oval fruit shape), DCFBS (Deep concave fruit base shape), DCFTS (deep concave fruit top shape), and CFTS (convex fruit top shape). There were 9 traits with an effective percentage less than 5%, only a few individuals in the population exhibited their phenotypes, including TP (Tree performance), LB (Leaf base), OFS (Oblate fruit shape), ObvFS (Obviate fruit shape), PFS (pyriform fruit shape), LNFBS (long neck fruit base shape), FFBS (flat fruit base shape), NCFBS (neck collar fruit base shape) and RFTS (round Fruit top shape). Traits with an effective percentage greater than 80% include DTP (draped Tree performance) and LNFBS (long neck fruit base shape), indicating these traits are relatively stable.
The Shannon-Wiener diversity index (H′) showed different traits had range between 0.46-1.39. These traits included Fruit shape (FS) and Pulp color (PC) (H′ > 1.20), consider being high genetic diversity 4 . The traits included Tree shape (TS) and Tree performance (TP) with lower genetic diversity (H′ < 0.60). The total value of these 8 quality traits diversity were 6.53; there were 4 types of fruit traits with diversity value of 4.29, which accounting for 65.7% of the total traits diversity value.
The Shannon-Wiener diversity indexes (H′) of 29 quantitative traits were in the range of 0.67-2.10, traits like Juice ratio (JR) (1.08), Petal number (PN) (0.67)and others with lower indexes, indicated that the phenotypic variants of these traits were relatively small, or the distribution of each phenotype was uneven. In this study, except the Juice ratio (JR) and Petal number (PN) these two traits had relatively lower diversity index (H′), other traits all had H′ greater than 1.2, reflecting the rich phenotypes of these traits, and the distribution of each phenotype was relatively uniform.
Correlation analysis of quantitative traits. Correlation analysis of quantitative traits showed a total of 149 pairs of traits were significantly correlated, of which 84 pairs were positively correlated and 65 pairs were negative correlated (Supplementary Table 2 Cluster analysis. The Ward method was used for conducting cluster analysis of 29 quantitative traits of the 151 individuals. The 151 individuals were divided into 4 categories (Fig. 3). A statistical analysis resulted in 4 groups: the first group containing 32 individuals; the main characteristics of this group were: fewer flesh segment number per fruit (SNF) and more oil cell number (OCN) in the fruit peel; the second group include 7 individuals, the main characteristics of this group were: larger crown breadth (CB), higher yield per tree (YPT), the lager leaf, the higher ascorbic acid (AA) and less seed number per fruit (SNPF); there were 25 individuals in the third group, the main characteristics of this group were: smaller tree foot diameter (TFD),smaller fruit shape index (FSI) and higher total soluble solids (TSS); the fourth group had 87 individuals, and characterized by shorter petiole length (PEL), larger fruit, higher Juice ratio (JR), multiple stamen number (SN) and longer pistil length.
Principal component analysis and comprehensive evaluation. In this study, principal component analysis was performed on 29 quantitative traits. Among the 29 quantitative traits, the eigenvalues of the first 9 principal components were greater than 1 (Fig. 4), and the cumulative contribution rate reach 72.20%, indicating that the first 9 principal components can represent most of the trait information about the 27 phenotypic traits of Calamansi (Table 4).  www.nature.com/scientificreports/ The PC1 had the largest contribution rate of 22.66%. The larger characteristic vectors were Fruit weight (FW), Fruit length (FL) and Pistil length (PIL), indicated that the first principal component was mainly affected by traits related to pistil length and fruit size. The contribution rate of the PC2 was 12.97%, and the larger eigenvector values were Tree height (TH) and Crown breadth (CB), indicated that the second principal component was mainly affected by the traits related to the tree. The contribution rate of the PC3 was 7.06%, and the larger eigenvector value was the Titratable acidity (TA), indicated that the third principal component was mainly affected by the titratable acid content. The contribution rate of the PC4 was 6.24%, and the larger eigenvector value was Ascorbic acid (AA). The contribution rate of the PC5 was 5.40%, and the trait with the largest eigenvector value was Branch width (BW). The contribution rate of the PC6 was 4.57%, and the trait with the largest eigenvector value was Branch node length (BNL). The contribution rate of the PC7 was 4.49%, and the traits with the largest eigenvector values were Petal number (PN) and Juice ratio (JR). The contribution rate of the PC8 was 4.03%, and the traits with the largest eigenvector values were Petiole length (PEL). The contribution rate of the PC9 was 3.78%, and the trait with the largest eigenvector value was Leaf shape index (LSI).
Comprehensive evaluation results showed that the comprehensive PC values of all samples were distributed between 38.63 and 97.41 (Supplementary Table 3), with a median of 68.02. There were 33 samples with comprehensive PC values greater than the median (Table 5), occupying all samples 21.85% of the total value.

Discussion
Phenotypic traits are the reflection of the comprehensive effects of the plant genotype and the environmental effects. Phenotype is an important manifestation of genetic variation, and it can directly indicate the abundance of specific genes. Phenotype is the basis for the germplasm innovative and variety improvement 14 . In this study, the phenotypic traits of 151 Calamansi samples from Hainan Island were statistically analyzed and evaluated.        www.nature.com/scientificreports/ comprehensive evaluation method had been used in the phenotyping and the classifications of other crops [18][19][20] . The results of this study could be used to select Calamansi individuals with outstanding traits. In addition, this research also found that the Calamansi seeds have extremely high level of polyembryonic, but the diversity analysis of the Calamansi population resulted relative higher diversity index, and phenotypic evaluation also showed relative higher diversity among the traits analyzed. This interesting phenomenon might imply high frequency of sprout mutation existing in the Calamansi germplasm population which caused relative high genetic diversity in descendant population after multi-generation of propagation by seeds. Another possibility is that in the long history of cultivation, open-pollinated Calamansi zygotic embryos under the growth pressure, had gradually produced stronger competitive ability than the somatic embryos, and developed into complete individuals, leading to the continuous evolution of Calamansi and phenotypic diversity. Finally, in the process of data collection, it was found that harvested fruits within the commercial standards weight range (10-13 g per fruit) had about twice as more seed numbers than that of fully mature Calamansi fruits. The reasons of this phenomenon and seed number reduction mechanism were unknown at the present time.
This study investigated 151 individuals of the Calamansi germplasm resources in Hainan Island, and evaluated various phenotypic traits of cultivated Calamansi. The research provided information for the whole genome association analysis of Calamansi. The resulting data proved to be useful in the subsequent genome-wide association analysis, which built up the connection between Calamansi's phenotype and the responsible genes.
This article is the first research to investigate the germplasm of Calamansi in Hainan Island, China. Hainan Island is a geographically isolated tropic environment. The Calamansi cultivation on the island has several hundred years history, Calamansi has under gone many generations of selections intentionally or unintentionally, the genetic variations (mutations) with advantage to their growth or beneficial to the growers were likely survived and being saved, many genetic variations were saved and cumulated resulted Calamansi's genetic diversities in the Hainan Island. This study can reflect the genetic characteristics of Calamansi to a certain extent. Calamansi is widely distributed in many countries in Southeast Asia, and is widely used in different culture of life. In the www.nature.com/scientificreports/ future, all the Calamansi germplasm resources in Southeast Asia will be collected and analyzed, which can more accurately study the genetic characteristics of Calamansi and its genetic information could provide more valuable references for Calamansi breeding and cultivar improvement.

Conclusion
In this study, the phenotypic traits of the Calamansi seedling populations in Hainan Island was first time evaluated. The study identified elite individuals for various traits, provided plant materials and data to support the subsequent Calamansi breeding operation. Since Calamansi is a widely cultivated "cash crop" in Hainan Island, it is a plant species that has important role in the local economy, especially for the farmers who only have small scale of land available. In this study, we systematically evaluated 37 phenotypic traits of the seedling populations of Clamansis, and found there were high level of genetic diversity among the Clamansis seedling populations for those traits. Existing Calmansi populations can serve as genetic resource for Calmanis variety development. where H′ was the diversity index, 'n' was the total number of classes, and 'Pi' was the effective percentage of the material distribution frequency in the 'i-th' class of the trait. Quality traits were directly calculated according to the effective percentage of each grade. Calculated the overall average ( x ) and standard deviation (s) for quantitative traits, and then from the first level < − 2 s, the tenth level ≥ + 2 s, and every 0.5 s was one level. The correlation between quantitative traits was calculated using Pearson's correlation coefficient, and the principal components of quantitative traits were extracted using dimensionality reduction analysis and factor analysis (SPSS 25.0) Finally,

Genetic diversity analysis. Sequencing of Calamansi genome and SNPs identification.
In this study, after preliminary analysis the phenotypic traits of 151 Calamansi fruit tree samples, 100 fruit trees with rich phenotypic characteristics were selected and subjected to genome sequencing. The library was constructed and sequenced through the Illumina sequencing platform, and 350G raw data were obtained. After acquiring the genomic data of Calamansi, Fastp software was used to perform quality control on the sequencing data, and then quality-controlled data were compared with genomic data of Citrus clementina 12 (https:// www. citru sgeno medb. org/ analy sis/ 156) to obtain the corresponding comparison information. Then GATK 4.0 software was used to perform mutation screening on 100 individuals genome sequences data to obtain the corresponding gvcf files. Finally, all the gvcf files were merged into vcf files, and vcf files were further filtered to obtain SNP site of 100 individual Calamansi. Default parameters were used by all software when processing the data.
Construction of phylogenetic tree. The phylogenetic tree of Calamansi was constructed by Phylip software based on the neighboring method. The specific code is as follows: Run_pipeline.  The larger the absolute value of the trait characteristic vector, the greater the influence on the PC. One or several of the traits with the largest absolute value of the characteristic vector under the PC can be considered that this PC is controlled by these traits to a certain extent. The eigenvectors under each PC is added to obtain the eigenvalue (E) of the PC. Through the software calculation, the eigenvalue can be converted into a contribution rate. In theory, the sum of the contribution rates of all PC equals 1, which can fully explain all the information of the original variables. According to the eigenvector matrix and standardized phenotype data, all samples were comprehensively evaluated 4 . The specific scoring formula was as follow: Fn = − 0.042 × 1 − 0.339 × 2 − 0.395 × 3 + …… + 0.681 × 27 + 0.367 × 28 + 0.824 × 29. Then the comprehensive principal component value F was calculated according to the ratio of the characteristic value corresponding to each principal component. In the calculation, the total characteristic value of the extracted principal component served as the weight to sort the comprehensive principal component value F = 0.227 × 1 + 0.140 × 2 + 0.071 × 3 + …… + 0.045 × 7 + 0.040 × 8 + 0.038 × 9.
Cluster analysis. The statistical analysis software SPSS25.0 was used to carry out the cluster analysis, the Ward method was used to conduct cluster analysis of 29 quantitative traits among 151 individuals. The 151 individuals were divided into 4 categories. Ward method is an alternative approach for performing cluster analysis; it looks at cluster analysis as an analysis of variance problem, instead of using distance metrics or measures of association. Ward method involves an agglomerative clustering algorithm, Ward's method starts out with n clusters of size 1 and continues until all the observations are included into one cluster. This method is most appropriate for quantitative variables cluster analysis.
Correlation analysis of quantitative traits. Correlation analysis of 29 quantitative traits was carried out by statistical analysis software SPSS25.0 among 151 Calamansi individuals, the directions and levels of the correlation among 29 quantitative traits were indicated in (Fig. 2). Basically the correlation analysis was performed use the data from the 29 quantitative traits, Karl Pearson's co-efficient of correlation was calculated to present the relationship between each other traits.
Ethical approval. The collected plant materials and research activities are in accordance with the laws and regulations of Hainan Province, China.
The collection of Calamansi resources has been approved by the grove owner Ming Bo Scientific Technology Co., Ltd.

Data availability
The data were collected by YHX and YXW. The materials were collected from the farm of Ming Bo Scientific Technology Co., Ltd.