Determining the geographical origin of common buckwheat from China by multivariate analysis based on mineral elements, amino acids and vitamins

This study aimed to establish a method for distinguishing the geographical origin of common buckwheat from Inner Mongolia, Shanxi and Shaanxi Provinces in China. Three chemical families including mineral elements, vitamins and amino acids of 48 samples from different geographical origins were analyzed by principal component analysis (PCA), cluster analysis (CA) and linear discriminate analysis (LDA) for this purpose. LDA clearly discriminated the geographical origin of common buckwheat samples grown in three regions, and gave a high correct classification rate of 95.8% and satisfactory cross-validation rate of 91.7%. Some variables (Mn, VPP, Se, Gly, Cu, Asp, Fe, and Ala) significantly contributed to the ability to discriminate the geographical origin of the common buckwheat. These results demonstrated that the proposed method is a powerful tool for controlling the geographical origin of common buckwheat by governmental administration and protecting consumers from improper domestic labeling. However, the discriminant method still needs to be further validated using more reliable data.

Profiles of amino acids. The characteristics of amino acids in common buckwheat from different regions are presented in Table 2. There was significant difference in the mean content of Asp, Glu, Gly, Ala, Met and Lys among the Inner Mongolia, Shanxi and Shaanxi (p < 0.05), while no obvious difference was found in other amino acids from samples. The significant difference in amino acids concentrations of common buckwheat samples made it possible to distinguish them from different regions and provided reliable results for further statistical analysis.

Principal component analysis (PCA).
In order to evaluate the difference of common buckwheat from different regions, indicators with significant differences (p < 0.05) was respectively processed by PCA. Table 3 showed the results of PCA and their discriminant analysis. The correct classification rate and their cross-validation rate of both model 1 (based on vitamin content), model 2 (based on mineral element content) and model 3 (based on amino acid content) were no more than 50%. The highest correct classification rate (79.2%) and their cross-validation rate (79.2%) were found in model 5 based on the combination of the content of amino acid, mineral element and vitamin content as well as relative content of amino acid. The discriminant results based on PCA were difficult to distinguish common buckwheat origins. Therefore, other statistical analysis methods should be further employed to obtain better results.
Cluster analysis (CA). To better visualize the relative distribution of the common buckwheat, CA was performed according to variables with significant differences (p < 0.05). The samples were grouped into clusters in terms of their nearness or similarity which was measured based on the Mahalanobis distance. The smallest distance indicated the highest degree of relationship, therefore, those objects are considered to belong to the same group. All samples from different regions were separated into three clusters based on the dendrogram cut at a distance of 60 (Fig. 1). The first cluster was composed of Shaanxi (n = 4) and Shanxi (n = 5). The second cluster was composed of samples from Inner Mongolia (n = 5), Shaanxi (n = 4) and Shanxi (n = 13), and the third cluster was composed of Inner Mongolia (n = 16) and only one Shanxi sample. The results indicated that CA could give a rough location distribution, but not well determined the geographical origin of common buckwheat, which was consistent with the results from PCA. Obviously, the use of PCA and CA in combination with all variables did not enable a good discrimination of the geographical origin of common buckwheat.
Linear discriminant analysis (LDA). For achieving better classification and identification of the common buckwheat samples from different regions, the stepwise discriminant procedure was carried out to extract best discriminant variable separating samples from different origins, which entered or removed variables by analyzing  Table 3. Discrimination model based on PCA and their accuracy. their effects on the discrimination of the groups based on the Wilks' lambda criterion. Table 4 summarized the observation of the cross-validation results together with the classification of common buckwheat samples using LDA model. The correct classification rate of model 1 (based on amino acid content), model 2 (based on vitamin content), model 3 (based on mineral element content) and model 4 (based on relative content of amino acid) were 60.4%, 62.5%, 66.7%, 77.1% and their cross-validation rate reached to 54.2%, 60.4%, 62.5%, and 72.9% respectively, which indicated that mineral elements, amino acids and vitamins compositions of common buckwheat from different origins was similar, making it difficult to distinguish the origins using one variable alone. In model 5, the combination of mineral element, vitamin and amino acid content as well as relative content of amino acid, was taken as the variable, the correct classification rate and cross-validation rate reached 95.8% and 91.7%, respectively. In model 5, nine variables (content of Mn, Se, Cu, Fe, VPP, Gly, Asp and Ala as well as relative content of Ala) were selected and thought to contribute significantly to the ability for discriminating the geographical origin (Table 4), and two discriminant functions were constructed on the basis of Wilks' lambda values (Fig. 2). The two functions explained the 100% of the variance (Function 1 explained 58.1% of the total variance, and function 2 explained 41.9%). Discriminant functions are shown as follows, Function 1 = −7.557-14.603Asp + 23.939Gly + 12.366Ala + 99.904Ala (relative content) + 0.219Cu-0.008Fe-0.183Mn + 7.281Se-0.360Vpp. Function 2 = −26.094 + 20.134Asp-6.402Gly-43.015Ala + 612.306Ala (relative content) -0.001Cu + 0.001F e−0.081Mn + 14.083Se + 0.493Vpp.
The separation of common buckwheat from Inner Mongolia, Shanxi and Shaanxi was checked by plotting the two functions scores (Fig. 2). It is clearly shown that common buckwheat from different regions was well distinguished from each other, confirming that selected variables provided the useful information for common buckwheat classification. To evaluate the predictive capacity, the generated model was then validated by the leave-one out cross-validation method and the LDA classification results of model 5 are summarized in Table 5. According to the selected nine indicators, the correct classification rate reached 95.2%, 94.7% and 100% for common buckwheat from Inner Mongolia, Shanxi and Shaanxi, respectively. The predictive ability of this model was 91.7%, indicating a satisfactory performance of this model for the classification of common buckwheat samples from  Table 4. Observations of the cross-validation results and discrimination model.

Discussion
The characteristics of plant-derived products can be highly influenced by several environmental and geological factors such as soil type, soil parent material, water, soil pH, and climate conditions. The element analysis is usually considered to be an effective tool, because plants can absorb the mineral elements from the soil and thus there is an association to some extent between the contents of mineral elements in environment and their accumulation degree in crops 17,18 . The method of element analysis has been applied for geographical origin assignment of some farm products such as wine 19 , honey 20 , mutton 21 , sheep milk 22,23 , Chinese cabbage 24, 25 , tea 6 , coffee 4 , wheat 26,27 , and other crops 28 as well as some aquatic products 29, 30 with different degrees of success. Besides, some organic compounds or physicochemical parameters (color, diastase activity, electrical conductivity, total antioxidant activity, etc.) have also been used to determine the geographical origin of some food and agricultural products 23, [31][32][33] . Amino acids are important components of foods, and they contributed directly to the taste of foods and color when heating foods. Some studies determined successfully the geographic origin of some agricultural products based on amino acids analysis 1   discrimination study by combinations of various types of substances has been used in the field of agricultural product in order to avoid the one-sidedness of variation of a kind of constituent 32,33,36,37 .
In the present study, discriminant analysis based on PCA and CA did not well determine the geographical origin of common buckwheat. Similarly, LDA using the stepwise method did not well determine the geographical origin when only a chemical family was analyzed independently. However, LDA method can effectively distinguish the common buckwheat from different origins based on the combination of three chemical families (mineral element, vitamin and amino acid), and the correct classification rate and cross-validation rate reached 95.8% and 91.7%, respectively.
Inner Mongolia Province covered a vast geographic area, with different soil types from east to west, such as dark brown soil, chestnut soil, brown loam soil, sand land and gray brown desert soil. Inner Mongolia is located in high latitude, and the area was a temperate continental monsoon climate. The Shanxi Plateau belonged to the warm temperate zone and temperate continental climate, with the complicated topography, loessal soils and brown soils. The differences of temperature and climate conditions were obvious because of longer distance of Shanxi Plateau from north to south. The Shaanxi Plateau was located in the transitional zone between China's southeast humid region and the northwest arid region, and was mainly in the continental middle temperate zone and the soil mainly made of loessal soils. These differences provided the feasibility for distinguishing the geographic origin of common buckwheat from Inner Mongolia, Shanxi and Shaanxi Provinces.

Conclusion
In summary, the present study showed that LDA using the stepwise method was much more effective than PCA and CA for classification of geographic origin of common buckwheat from Inner Mongolia, Shanxi and Shaanxi Provinces based on the combining the mineral element, vitamin and amino acid compositions. As suggested by LDA, some variables (Mn, Se, Cu, Fe, VPP, Gly, Asp and Ala) were regarded as the good classifier for determining geographical origin of common buckwheat, and the correct classification rate and cross-validation rate reached 95.8% and 91.7%, respectively. Therefore, the results of this study can provide theoretical data and also be used as a powerful recognition tool for the origin traceability and identification of common buckwheat. However, LDA discriminant method still needs to be further validated using more reliable data.

Methods
Data sources. Data of mineral elements, vitamins and amino acids in common buckwheat were collected from Chinese Crop Germplasm Resources Information System (CGRIS) which provides data for the public (http://icgr.caas.net.cn). Complete data of 48 common buckwheat samples cultivated in Inner Mongolia, Shanxi and Shaanxi Provinces which are the main production regions of common buckwheat in China were obtained from the database. The content of Cu, Mn, Fe, Zn, and Ca were determined by atomic absorption method; the content of Se and P were determined by hydride atomic fluorescence spectrometry and spectrophotometric methods, respectively; the content of amino acids were determined using Amino Acid Analyzer; the content of VPP and VE were determined by gas chromatography and photocolorimetric methods, respectively. The locations and details of samples are shown in Fig. 3 and Table 6.
Statistical analyses. Analysis of variance was first carried on each single component of all the samples to determine significant differences (p < 0.05). Unsupervised classification was performed with cluster analysis (CA) to measure the similarity between samples, and CA was carried out by DPS 16.05 software based on standardization transformation of data, Mahalanobis distance and flexible group average method. Principal component analysis (PCA) was used to reduce the dimensionality of the data for linear data analysis, and the extraction of principal component was based on the eigenvalue greater than 1. Linear discriminant analysis (LDA) using the stepwise method was carried out to evaluate whether samples from different regions could be mathematically distinguished. The statistical significance of each discriminant function was evaluated on the basis of the Wilks' lambda and F value criteria, and predictive ability of classification model was evaluated by a cross-validation test, using the 'leave-one-out' procedure. Analysis of variance, PCA and LDA were performed by the IBM SPSS Statistics 19 package for windows.