Comparison of metabolites and variety authentication of Amomum tsao-ko and Amomum paratsao-ko using GC–MS and NIR spectroscopy

Amomum tsao-ko, as an edible and medicinal variety, has been cultivated for more than 600 years in China. Recently, two cultivars, A. tsao-ko and Amomum paratsao-ko, were found in A. tsao-ko planting area. The two cultivars are often confused because of the similar phenotype and difficult to distinguish through sensory judgment. In this study, the non-targeted gas chromatography-mass spectrometry (GC–MS) metabolomics combined with near-infrared spectroscopy (NIRS) were used for dissecting the two cultivars with phenotypic differences. According to principal component analysis (PCA) loading diagram and orthogonal partial least squares discriminant analysis (OPLS-DA) S-plot of the metabolites, the accumulation of major components including 1,8-cineole, α-phellandrene, (E)-2-decenal, (−)-β-pinene, (E)-2-octenal, 1-octanal, D-limonene, and decanal, were present differences between the two cultivars. Seven metabolites potential differentiated biomarkers as β-selinene, decamethylcyclopentasiloxane, (E,Z)-2,6-dodecadienal, (E)-2-hexenal, (E)-2-decenal, isogeranial, 1,8-cineole and β-cubebene were determined. Although A. tsao-ko and A. paratsao-ko belong to the same genera and are similar in plant and fruit morphology, the composition and content of the main components were exposed significant discrepancy, so it is necessary to distinguish them. In this study, the discriminant model established by GC–MS or NIRS combined with multivariate analysis has achieved a good classification effect. NIRS has the advantages of simple, fast and nondestructive and can be used for rapid identification of varieties and fruit tissues.

eraceae. They are difficult to distinguish due to the similar shape and color of the dried fruits and the similar spicy aroma. For a long time, the two varieties have been used confusedly in the medicinal and spice markets. . The samples dried were stored in a cool and ventilated place until metabolite analysis was carried out. Two fruit tissues, the pericarp and the seed, were tested separately, and divided into the pericarp of A. paratsao-ko (BH-P), the seed of A. paratsao-ko (BH-S), the pericarp of A. tsao-ko (HH-P), the seed of A. tsao-ko (HH-S) (each tissue site in three biological replicates). The study's authors promisethe use of plants in the present study complies with international, national and/or institutional guidelines.
Gas chromatography-mass spectrometry (GC-MS) analysis. Solid phase micro-extraction (SPME) separation technology was used as the pretreatment method of the sample. The samples were added to a 20 mL headspace bottle, sealed with a cap, and then extracted were analyzed using a CTC trinity automatic sampler with extraction head (50/30 µm DVB/CAR on PDMS). The sample was shaken for 15 min at 250 rpm and then extracted for 30 min at 50 °C. Resolution time was 5 min, and cycle time was 50 min. GC-MS was carried out on an Agilent Technologies 7890B-5975B GC-MS unit with DB-wax Chromatographic column (30 m × 0.25 mm, 0.25 µm thickness). Chromatographic was set as follows: inlet temperature was 260 °C, helium gas was used as carrier gas, and constant flow rate was 1 mL/min. Temperature program was as follows: gradually increased to 40 °C and held for 5 min, and then increased to 220 °C at a rate of 5 °C/min, and reached a final temperature of 250 °C at a rate of 20 °C/min and held for 2.5 min. Mass spectrometry condition was as follows: the interface temperature was 260 °C, the ion source temperature was 230 °C, the quadrupole temperature was 150 °C, and the electron ionization (EI) mode was 70 eV. The scan range was 40-800 m/z in full scan mode.
The mass spectrometry data obtained from GC-MS was compared with the standard mass spectrometry database (National Institute of Standards and Technology Library 14.0 and the NEW Wiley 9 mass spectra library database) and the published literature to obtain a dataset containing peak intensity, retention time, and sample information. The original data were transformed into CDF format (NetCDF) using the Agilent GC/MS 5975 Data Analysis software and processed using the XCMS software (v.1.36.0; www. bioco nduct or. org). The processes of nonlinear retention time alignment, baseline filtration, peak identification, matching, and integration were included. The dataset was normalized using the total area of the peaks in each sample.
Near-infrared spectroscopy (NIRS) analysis. NIRS  www.nature.com/scientificreports/ ground to fine powder and pass a 100 mesh sieve. The samples were loaded into a NIRS sample cup for measurement. Each sample was repeated 3 times, and the average spectrum was taken. The sample was scanned in 64 with a spectral resolution of 8 cm −1 and a scanning range of 12,000-4000 cm −1 .
Statistical analysis. The metabolites data underwent logarithmic transformation and normalization of auto-scaling to enhance Min-Max Normalization and normality. Hierarchical cluster analysis (HCA) of the metabolites was carried out on R software to study the variations of HH and BH. The heatmap and histogram to display the experimental data structure were drawn by R-3.2 software (www.r-proje ct. org) and Sigmaplot-10.0 (sigmaplot.en.softonic.com), respectively. Principal component analysis (PCA) loading diagram and orthogonal partial least squares discriminant analysis (OPLS-DA) were used to visualize the metabolic differences between cultivars after unit variance scaling and mean centering. The differential metabolites with univariate statistical significance were extracted (P < 0.05 and VIP > 1.0). Variable influence on projection (VIP) is a widely used method for screening important variables based on model, and values greater than 1 are generally considered to be important variables for classification 18 . Here, we screen for characteristic chemical composition and spectrum based on this value. Metabolites were mapped onto biochemical pathways according to annotation in Kyoto Encyclopedia of Genes and Genomes (KEGG) (www. kegg. jp). Raw NIRS data contained vast noise and interference information. Common preprocessing methods as first derivatives (FD), multiplicative scatter correction (MSC) and Savitzky-Golay (SG) were applied to preprocess the original spectral data. PCA and partial least squares discriminant analysis (PLS-DA) were used to classify and screen the characteristic spectra using the dataset comprised of pretreated spectra to achieve better calibrations. The veracity of a model was evaluated in terms of some statistical parameters including the goodness of fit (R 2 ) and the goodness of prediction (Q 2 ). Briefly, R 2 indicates the cumulative interpretation ability and Q 2 demonstrates the cumulative prediction ability of the present model according to cross-validation, a good model should have a higher R 2 and Q 2 (close to 1). Origin 8.0 software (origin.en.softonic.com) was used to draw average NIRS and feature variable spectra. Multivariate statistical model was worked using SIMCA-P + 14.1 (Umetrics, Sweden) (www. sarto rius. com). Fig. S1, the fruits were present differences in color and size. The fruit of A. tsao-ko (HH) was large and brown, while the fruit of A. paratsao-ko (BH) was small and reddish-brown. But the dried fruits are the same color, all brown, and difficult to identify by the senses. The abundant phenotypic differences among cultivars may be reflected in the changes of metabolites, which is also an important reason for the analysis of metabolites. In order to compare the metabolite data of different orders of magnitude, 82 metabolites (Table S1) were selected from 222 metabolites detected in the GC-MS for this combinatorial analysis after data preprocessing such as exclusion of systematic errors. These metabolites were mainly volatile organic compounds of isomers, with the largest group contained 40 terpenoids accounting for 48.8% of the total metabolites, and the second largest group had 31.7% of the total identified metabolites: 26 aliphatic metabolites, followed by 5 aromatic groups (6.10%) and some other compounds (13.4%).

Analysis of metabolites. As shown in
Subsequently, the metabolites were visualized on a heatmap ( Fig. 1) using HCA to capture the maximum number of metabolites that could change between each group. Hierarchical cluster analysis based on relative levels across samples made metabolites clusters among cultivars, and the differences in the relative content of metabolites were also clearly shown. Among them, 25 metabolites (30.5%) were upregulated in the pericarp of A. tsao-ko (HH-P) and 18 metabolites (22.0%) were upregulated in the seed of A. tsao-ko (HH-S). However, there were 46 upregulated (56.1%) metabolites in the pericarp of A. paratsao-ko (BH-P) and 30 upregulated metabolites (36.6%) in the seed of A. paratsao-ko (BH-S). The research reported that 75.0% differences in plant metabolites are caused by genetic factors, and plant with same biological origin have stable morphological characteristics and clinical efficacy 19 . Biological populations with the same genetic background will undergo changes in metabolism and resistance due to changes in environmental factors, resulting in phenotypic changes. This indicated that the production and distribution of plant metabolites are usually specific to species, organs, tissues, growth stages and environments. In this study, the relative contents of terpenoids in the tissues were significantly changed, followed by aliphatic metabolites and other metabolites were relatively stable. The volatile chemicals produced by plants including a wide variety of short-chain alcohols, aldehydes, ketones, esters, aromatic phenols, and lactones, as well as mono-and sesquiterpenes, which are major constituents of volatile organic chemicals produced in response to wounding and insect attack 20 . There was a research report that aliphatic and alkaloid products will highly enrich during traumatic stress 21 . The monoterpenes such as ( +)-sabinene and α-and β-thujone have been shown to resistance pathogens and herbivores in the western redcedar foliage 22 . BH is brightly colored and more attractive to insects. And the number of fruits on BH plant is less than HH plant, resulting in it may be more vulnerable when faced with the same external threat. Plants protect themselves through changing their physiology, investing more metabolic flux into defensive metabolism when under stress, such as producing secondary metabolites to help maintain dynamic balance and health. It has been reported that essential oils containing (E)-β-ocimene may have a positive effect on the defense of Schinus terebinthifolius 23 . β-Caryophyllene promotes cell proliferation and migration and wound healing through a variety of ways and mediates analgesic effects to relieve pain 24 . β-Ocimene is usually induced from vegetative plant tissues after the plant invaded and regulates plant interactions with parasites and herbivorous predators, which is often used as chemical indicators for attracting natural enemies of herbivorous insects in a variety of plants 25 . Studies have found that volatile signals, such as cis-jasmone, generated in plants attacked by herbivores, also induced β-ocimene emissions in undamaged neighbors to participate in defense 26 . In addition to its impacts on insects, β-ocimene mediates the expression of several genes related to plant defense through activating the jasmonic acid signaling pathway, suggesting the www.nature.com/scientificreports/ compound plays an active role in indirect insect defense and plant-to-plant communication 27,28 . Unsaturated aldehydes such as (E,E)-2,4-decadienal produced by diatoms during cell damage have an invisible inhibitory effect on the reproduction of herbivorous crustaceans 29 . Interestingly, these metabolites are also highly expressed in BH, which may be partly related to the defense mechanism of fruits. In the field investigation, it was found that HH and BH were usually cultivated under alpine forests, and most farmers adopted the method of "heavy planting and light managing" to make them grow naturally. So those plants were always in a state of "imitating wild". The deep forest environment is complex and diverse, so we speculate the reason for this difference may be the joint action of genetic characteristics and growth environment. Studies have reported that there was a phenomenon of removing the pericarp and only selecting the seed as the use part of HH 2 . Our results showed that the pericarp also has a high chemical composition and can be comprehensively utilized to increase the added value rather than discarded in the actual production and processing.
For displaying data structure and species-specificity, the dimensionality of the data was reduced to mapping and visualization the groups of samples. Regarding the metabolites PCA 3D scores plot (Fig. 2a), the three principal components explained 91.5% of the overall variance of the metabolite profiles, 50.5%, 26.7% and 14.3% for PC1, PC2 and PC3. The R 2 value was 0.982 and the Q 2 value was 0.916. HH and BH showed clear separation, and the pericarps and seeds were also independent groups. The tight grouping of each biological replicate indicated the model validity, as well as technical variability and low biological. The loading plot (Fig. 2b) exposed metabolites that contributed to tissues and fruits segregation pattern 30 (Fig. 2c), HH and BH were strictly divided into two broad categories. The clustering distance between cultivars was larger than individual tissues, revealing that the differences were mainly reflected in cultivars rather than individual tissues.
Analysis of differential metabolites. OPLS-DA maximize class separation through a supervised priori knowledge, rather than interpreting variations within a set of samples. The model was applied to assess the effects of metabolites in different varieties and highlight the abundance of metabolites in the pericarps and seeds 30 . The two models score diagram (Fig. 3a,b) showed quite effective separation among varieties and tissues with good predictive and excellent classification ability (The pericarps; R 2 100% and Q 2 99.9%, the seeds; R 2 99.9% and Q 2 99.8%). The two OPLS-DA models achieved 100% classification accuracy in the condition of without overfitting through 200 iteration tests (R 2 Y < 0.4), indicating that the model was applicative (Fig. S2). The inspection of S-plot, a tool generated through plotting the correlation against the covariance, revealed pivotal metabolites responsible for noteworthy distinct variation among the pericarps and seeds (Fig. 3c,d).  32 . Previous studies have reported that BH and HH have similar effects and are often used together as substitutes for each other 33 . In this study, it was found that the compositions and accumulative contents of the main components showed significant discrepancy between the two varieties, which might lead to different biological activities.  (Table S2) specifically expressed between varieties were eventually identified from the 82 metabolites, and 7 metabolites potential differentiated biomarkers were screened (Fig. 4). Among them, 3 metabolites, β-selinene, (E,Z)-2,6-dodecadienal and (E)-2-hexenal, were expressed only in BH. The other 4 metabolites, (E)-2-decenal, isogeranial, 1,8-cineole and β-cubebene, were expressed only in HH, whereas 1,8-cineole and β-cubebene were expressed only in HH-S. It has been reported that there are significant differences in the shape and size and a relationship between the content of volatile oil and fruit phenotype of HH among different populations 31 . Some scholars extracted the volatile oil from HH by modified-solvent free microwave extraction www.nature.com/scientificreports/ (M-SFME) 4 , ultrasonic extraction 34 and hydrodistillation 35 . The results revealed that chemical composition of volatile oil was very complex. Chemical composition and relative content were different with different extraction methods and regions, but the main components of volatile oil were similar, which have also been confirmed in our results. Huang et al. identified 47 components from HH and 36 components from BH, and found a total of 20 chemical components in common between the two varieties, but the main chemical components were different, which was consistent with our research results 36 . In this study, it was found that HH and BH shared 69 metabolites, which may be related to analytical methods and samples.
Analysis of metabolic pathway of the different metabolites and specific metabolites. Enrichment analysis showed that the different metabolites and specific metabolites were mainly concentrated in C10 isoprenoids, fatty aldehydes, ketones, C15 isoprenoids and sesquiterpenoids (Fig. S3). Isoprenoid compounds might be particularly relevant in adapting plants to adverse external conditions by serving as additional and/ or alternative protection mechanisms and play vital roles in all living systems. Fatty aldehydes are important components of the cellular lipidome and serve intermediates in the interconversion between fatty acids and fatty alcohols. Fatty aldehydes are crucial substances that mediate a wide range of vital physiological functions, particularly lipid peroxidation. Ketones have long been considered to be abnormal and unpopular byproducts of incomplete oxidation of fat. They only recognized as normal circulating metabolites in the early twentieth century, and intense ketogenic diets were found to be highly effective in treating drug-resistant epilepsy in children. Circulating ketones can replace glucose as the brain's primary fuel during hyperketonemia.
There is growing evidence of mitochondrial dysfunction and reduced bioenergy efficiency in the brains of sufferer with Parkinson's disease and Alzheimer's disease, since mitochondria effectively use ketones to generate ATP and may also help protect vulnerable neurons from free radical damage 37 .
In this study, a total of 9 metabolites were found to have corresponding KEGG annotations (Table S1). Among these metabolites, D-limonene is involved in the monoterpenoid biosynthesis, biosynthesis of terpenoids and steroids and limonene and pinene degradation. D-Limonene can be synthesized from (+)-α-pinene, β-pinene, nerolidol, and (R)-linalool. D-Limonene derivatives such as α-pinene, perillyl alcohol, terpineol, carvone and carveol also have important application value. Acetaldehyde is a natural substance that exists widely in plant cells and tissues and is involved in multiple metabolic pathways such as glycolysis/gluconeogenesis, phenylalanine metabolism, phosphonate and phosphinate metabolism, pyruvate metabolism, and degradation of aromatic compounds. 1,8-Cineole is formed by (−)-α-terpineol and geranyl diphosphate and is a precursor of 6-endohydroxycineole and oxidized flavodoxin. 1,8-Cineole is involved in the monoterpenoid biosynthesis and inflammatory mediator regulation of TRP channels. Menthol, 1,8-cineole and icilin interact with TROM8 channel to affect the tryptophan metabolism. (−)-β-Pinene is produced from by geranyl diphosphate by cyclization of geranyl-diphosphate diphosphate lyase, which is involved in the monoterpenoid biosynthesis. Nerolidol is a naturally occurring sesquiterpene alcohol and is synthesized as an intermediate in the production of (3E)-4,8-dimethy-1,3,7-nonatriene (DMNT), which is an herbivore-induced volatile that protects plants from herbivore damage. Nerolidol is involved in the biosynthesis of terpenoids and steroids and sesquiterpenoid and triterpenoid biosynthesis. Geranial and (Z)-citral are downstream products of nerolidol, and geranial can be synthesized by (Z)-citral. Geranial and (Z)-citral are both involved in the biosynthesis of terpenoids and steroids and geraniol degradation. β-Selinene is formed by trans, trans-farnesyl diphosphate and is involved in the sesquiterpenoid and triterpenoid biosynthesis. (E)-2-Octenal is generally considered as a plant secretion with defensive properties and is involved in the prodigiosin biosynthesis. The reaction of (E)-2-octanal and pyruvate can produce (S)-3-acetyloctanal, octanal, 1-octanol, trans-2,4-decadienal, 2-octenoic acid, etc., these compounds are all downstream products of (E)-2-octenal. In general, the different metabolites and specific metabolites were primarily involved in the monoterpenoid biosynthesis, biosynthesis of terpenoids and steroids, and sesquiterpenoid and triterpenoid biosynthesis. Terpenoids are natural products with various structural characteristics produced by isoprene biosynthesis pathway. Isoprenoid cyclase is the key enzyme leading to the diversity of isoprenoid. On various occasions, these polypentenyl bisphosphates undergo a series of cyclists to generate the parent skeletons of monoterpenes (C10), sesquiterpenes (C15) and diterpenes (C20), triterpenoid (C25), followed by various modifications such as hydroxylation, methylation, and glycosylation to produce thousands of different isoprene metabolites. Recent studies have shown that almost all isoprene compounds such as monoterpenes, diterpenes and carotenoids are synthesized by the methylerythritol phosphate (MEP) pathway in plastids, while the mevalonate (MVA) pathway is responsible for the biosynthesis of sesquiterpenes and triterpenes 38 . The two pathways of terpenoid biosynthesis also communicate with each other. Studies found that intermediates generated from the MVA and MEP can transport between cytoplasm and plastids, which has been found in Arabidopsis thaliana and tobacco and other plants [38][39][40] . Meanwhile, these terpenoids can be transformed into each other under certain conditions, such as limonene and linalool can be converted into α-terpineol under the action of enzymes 41 .
When metabolites are involved in metabolic pathways, the final substances expressed lead to differences in plant species. For example, 1,8-cineole is rich in specific expression of HH, which can synthesize 2-exo-hydroxy-1,8-cineole and oxidized flavodoxin, etc. Studies have reported that 1,8-cineole induces arrest and senescence in HepG2 cells by oxidative stress and multiple signaling pathways, thereby inhibiting the growth of cancer cells 42 . A recent study found that 1,8-cineole can effectively bind the main viral proteinase (Mpro/3CLpro) to inhibit the coronavirus reproduction process, and it may have potential therapeutic potential as an inhibitor of COVID-19 Mpro 43 . In addition, 1,8-cineole has been widely reported for the anti-inflammatory, antioxidant, antibacterial, insecticidal and other activities 44 . Although the phenotypes of these two varieties were similar, they had their own characteristics, and their metabolic pathways and metabolic energy were still different. HH had larger fruit, higher yield and large seed capacity, but the accumulation of metabolites was less in its pericarp. It is speculated that HH carried more seeds and needed more nutrients to meet the requirements of metabolism, leading to the www.nature.com/scientificreports/ transfer of metabolic energy to seeds. Plant resources available for reproduction are limited and there may be trade-offs between two or more organs competing for the same resource library. The role of pericarps is usually to protect seeds and help them spread. More highly expressed metabolites in the pericarp of BH might have better resistance to the complex environment or be more vulnerable to threats in the same environment so that the metabolic energy transferred to the pericarp. In the face of the same environmental changes, HH and BH showed different metabolic mechanisms and reactions with different metabolites, and the biological activity and value of the two varieties also changed, which may also be due to the different gene expression profiles.
Interpretation of NIRS feature. The overlapping raw spectra showed some similar characteristic peaks and different absorption intensities, indicating differences in the accumulation of compounds in HH and BH ( Fig. 5a and S4). The band 4500-4000 cm −1 was attributed to C-H telescopic vibration in CH, CH2, CH3 and CH═CH; the band 5500-4200 cm −1 may be the combination of the second overtone of C-H and O-H; peak at 7000-5500 cm −1 represented the stretch of C-H and O-H; band at 9000-7500 cm −1 may be related to the telescopic modes of second overtone in O-H and C-H [45][46][47][48] . These bonds are abundant in terpenes, aliphatic groups, phenols and carbohydrate. However, the peak position and peak shape of different varieties were similar, which was difficult to distinguish by the original spectrum, so pattern recognition method was needed in the next analysis. NIRS data was subjected to multivariate analyses so that it could be visualized. Figure 5b showed the PCA score plot derived from NIRS data set. PC1 (64.7%) and PC2 (17.3%) explained 82.0% of the variation. The R 2 value was 0.903 and the Q 2 value was 0.819. The PCA plot could reveal overall differences of tissues of HH and BH samples from different varieties. BH, HH, pericarps and seeds were clearly distinguished in two principal components, indicating obvious discriminations among different groups. Although the classification models established by GC-MS and NIRS data were suitable for the discrimination of HH and BH, PCA separation results based on spectral data and PCA model established by metabolites presented some differences in the fraction of the first two principal components, which could be explained by the difference in chemical information detected www.nature.com/scientificreports/ in the two analytical methods. NIRS reflected the characteristics of chemical bonds and functional groups in organic compounds, mainly hydrogen-containing group information (C-H, O-H, N-H). PLS-DA model was subsequently conducted using spectral data to screen the important spectral regions of the classification model. The results showed that the accuracy of the PLS-DA classification model (R 2 99.0% and Q 2 96.2%) based on all data reached 100% without overfitting (Fig. S5). In order to verify the reliability of the classification model, two-thirds training sets and one-third test sets of samples were divided to build PLS-DA model. This model explained 84.5% of the variation, and the R 2 value was 0.995 and the Q 2 value was 0.961 (Fig. S6). Eventually, both the training set and the test set of the model reached 100% accuracy without overfitting, disclosing PLS-DA had excellent prediction capacity and more applicative in distinguishing BH and HH and their different tissues (Fig. S7). As can be seen from the Fig. 5c that the important classification variables were concentrated in the bands of 9384-8500 cm −1 , 6629-6383 cm −1 , and 5395-5127 cm −1 . The band around 12,000 cm −1 , 7500 cm −1 and 6000 cm −1 also showed some significant classification variables. These variables may be caused by accumulated differences in chemical composition.

Conclusions
In this study, GC-MS untargeted metabolomics and NIRS was firstly used to analyze and discriminate HH and BH with comparable phenotypes and metabolic differences. NIRS was regarded to be an effective way to distinguish different varieties and tissues due to it has the advantages of non-destructive, easier to operate, less expensive and efficient. The results of metabolite analysis showed that there were significant differences in the metabolite expression levels between HH and BH, especially in the main components including 1,8-cineole, α-phellandrene, (E)-2-decenal, (−)-β-pinene, (E)-2-octenal, 1-octanal, D-limonene, and decanal, so it was not recommended to substitute each other. On this basis, 7 metabolites with specific expression including β-selinene, decamethylcyclopentasiloxane, (E,Z)-2,6-dodecadienal, (E)-2-hexenal, (E)-2-decenal, isogeranial, 1,8-cineole and β-cubebene were screened as potential differentiated biomarkers. KEGG pathway analysis showed that the synthesis of terpenoids was the most important metabolic pathway between the two fruits, presumably causing to metabolic discrepancy and species-specific phenotypes. Overall, this work provides a data basis for the identification of specificity and tissue differences between HH and its related species and potential marker compounds for the species differentiation. At the same time, it preliminarily opens a new direction to HH and its related species in the search for secondary metabolites, molecular markers related to variability and response to stress and bioactive compounds.