Potential of spectroscopic analyses for non-destructive estimation of tea quality-related metabolites in fresh new leaves

Yamashita, Hiroto; Sonobe, Rei; Hirono, Yuhei; Morita, Akio; Ikka, Takashi

doi:10.1038/s41598-021-83847-0

Download PDF

Article
Open access
Published: 18 February 2021

Potential of spectroscopic analyses for non-destructive estimation of tea quality-related metabolites in fresh new leaves

Hiroto Yamashita^1,2,
Rei Sonobe^1,3,
Yuhei Hirono^3,4,
Akio Morita^1,3 &
…
Takashi Ikka^1,3

Scientific Reports volume 11, Article number: 4169 (2021) Cite this article

1868 Accesses
5 Citations
Metrics details

Subjects

Abstract

Spectroscopic sensing provides physical and chemical information in a non-destructive and rapid manner. To develop non-destructive estimation methods of tea quality-related metabolites in fresh leaves, we estimated the contents of free amino acids, catechins, and caffeine in fresh tea leaves using visible to short-wave infrared hyperspectral reflectance data and machine learning algorithms. We acquired these data from approximately 200 new leaves with various status and then constructed the regression model in the combination of six spectral patterns with pre-processing and five algorithms. In most phenotypes, the combination of de-trending pre-processing and Cubist algorithms was robustly selected as the best combination in each round over 100 repetitions that were evaluated based on the ratio of performance to deviation (RPD) values. The mean RPD values were ranged from 1.1 to 2.7 and most of them were above the acceptable or accurate threshold (RPD = 1.4 or 2.0, respectively). Data-based sensitivity analysis identified the important hyperspectral regions around 1500 and 2000 nm. Present spectroscopic approaches indicate that most tea quality-related metabolites can be estimated non-destructively, and pre-processing techniques help to improve its accuracy.

Dissection of hyperspectral reflectance to estimate nitrogen and chlorophyll contents in tea leaves based on machine learning algorithms

Article Open access 15 October 2020

Hiroto Yamashita, Rei Sonobe, … Takashi Ikka

Estimating growth and photosynthetic properties of wheat grown in simulated saline field conditions using hyperspectral reflectance sensing and multivariate analysis

Article Open access 11 November 2019

Salah El-Hendawy, Nasser Al-Suhaibani, … Urs Schmidhalter

Comparative quantification of chlorophyll and polyphenol levels in grapevine leaves sampled from different geographical locations

Article Open access 10 April 2020

Elísabet Martín-Tornero, Ricardo Nuno Mendes de Jorge Páscoa, … João Almeida Lopes

Introduction

Plants collectively produce many metabolites with estimates ranging from 100,000 to 1 million, and many metabolites are thought to play essential roles in resistance to biotic stresses and tolerance of abiotic stresses^1,2,3,4,5. In addition, natural products synthesized in plants provide indispensable resources for human health and survival⁵. Given the importance of plant metabolites to plant development and adaptation, and for human health, various quantitative and qualitative analyses have been developed. The main examples are based on chromatography techniques such as gas chromatography or high-performance liquid chromatography (HPLC) with improved mass resolution and sensitivity^6,7. However, these analytical methods require the destructive collection and pre-treatment of plant samples, which makes them slow in acquiring analytical data and unsuitable for real-time diagnosis of metabolite level.

Hyperspectral reflectance sensing is an established spectroscopic method that can provide rapid analysis without the need for sample pre-treatment. It is commonly applied to visible (VIS; 400–700 nm), near-infrared (NIR; 700–1000 nm), and short-wave infrared (SWIR; 1000–2500 nm) spectral ranges and has been used to estimate leaf pigments and water contents^8,9. The VIS is dominated by absorption of the photosynthetic pigments such as chlorophylls, carotenoids, and anthocyanins⁸. On the other hand, NIR spectroscopy is directly relevant to the overtones and combinations of the fundamental C–H, O–H, and N–H bonds in organic molecules^10,11. Thus, NIR spectroscopy provides physical and chemical information and has shown good potential in estimating different parameters in biotic samples, including metabolites in plants, agricultural products, and food^12,13,14. In addition, machine learning techniques provide powerful tools for constructing regression or classification models in agricultural indices from hyperspectral reflectance data¹⁵. The methodology of machine learning algorithms provides a flexible model not only for data-driven decision-making but also for capturing expertise into the algorithms¹⁶. The technique shows good potential for analyzing hyperspectral reflectance data with all spectral information based on a large number of bands¹⁷. Machine learning techniques also enable the assessment of hyperspectral features that are informative for high accuracy predictive modelling^16,18.

Tea plants (Camellia sinensis L.) are mainly distributed and cultivated in Asia to produce several tea types, such as green tea, oolong tea, and black tea, which are popular non-alcoholic beverages consumed all over the world. Tea-drinking reportedly has numerous and diverse health benefits¹⁹. Generally, tea quality and function are defined by the profile of various chemical components, such as catechins, caffeine, and theanine, which are characteristics to tea leaves. Tea catechins, which comprise a major class of polyphenols, contribute to the taste of astringency and bitterness of tea and have been studied for their health functions such as antibacterial activities²⁰ and free radical scavenging activities²¹. Free amino acids, especially glutamate (Glu) and theanine, contribute to the umami taste of green tea^22,23. In particular, theanine, a unique amino acid in tea plants, has the activities of promoting relaxation²⁴ and reducing blood pressure²⁵. Caffeine (1,3,7-trimethylxanthine) is a kind of purine alkaloid and its consumption may be associated with a reduced risk for type 2 diabetes²⁶, but excessive intake of caffeine may cause inflammation of the digestive organs, insomnia, and arrhythmia²⁷. Thus, unique tea quality-related metabolites are the most important agronomic traits targeted by modern and future tea cultivation and breeding. To evaluate the levels of these metabolites, many analytical tools have been employed to quantify tea quality-related metabolites including free amino acids, catechins, and caffeine contents in tea samples. Many analytical methods have been based on HPLC^28,29 and capillary electrophoresis^30,31, but these methods destructively use plant tissues and are time-consuming and expensive to perform. Therefore, a rapid and accurate method for the evaluation of quantitative traits in tea leaves is in high demand for tea cultivation management and breeding programs. The NIR-based estimation of some chemical components in ground tea leaves has been established by previous studies^32,33,34. Few studies have been reported in a non-destructive method for fresh leaves^35,36. Huang et al.³⁵ have reported non-destructive estimation methods for four main catechins and caffeine in fresh green leaves based on VIS–NIR spectra (400–2498 nm) and partial least squares (PLS) model. However, the outcomes of this study were limited by fewer tea quality-related metabolites and the sample status from leaf positions and fewer tea quality-related metabolites, which cannot achieve robust results in actual agricultural management.

We have achieved the non-destructive estimation of chlorophyll and nitrogen contents in tea leaves by combining the VIS–NIR–SWIR (400–2500 nm) hyperspectral reflectance data and machine learning algorithms³⁷. In the current study, we acquired the reflectance and 15 tea quality-related metabolites traits from the various nitrogen conditions, the leaf-stage, shading conditions, and albino tea leaves to construct the robust models. Pre-processing techniques and machine learning algorithms for hyperspectral data were used to perform regression modelling to non-destructively estimate the contents of free amino acids, catechins, and caffeine as tea quality-related metabolites in new fresh leaves. Our modelling indicated that most tea quality-related metabolites can be estimated by VIS–NIR–SWIR hyperspectral reflectance data and machine learning algorithms and that pre-processing techniques help to improve its accuracy. In particular, the combination of de-trending (DT) pre-processing methods and Cubist algorithms showed the highest model performance for most tea quality-related metabolites.

Results

Data distribution of reflectance data and tea quality-related metabolite contents

Original reflectance (OR) data were obtained at 1-nm steps across the 400 to 2500 nm wavelength from approximately 200 leaves in four experiment conditions. Five pre-processing methods, namely first derivative reflectance (FDR), continuum-removed (CR), standard normal variate (SNV), multiplicative scatter correction (MSC), and DT, were applied to the OR data. Several spectral patterns were observed in OR and pre-processed reflectance (Fig. 1). In the same leaves that were measured by reflectance, we analyzed catechins, caffeine, and FAAs as tea quality-related metabolites by HPLC and acquired 15 phenotypic traits. For catechins, the contents of (+)-gallocatechin (GC), (+)-catechin (C), (−)-epicatechin (EC), (−)-epigallocatechin (EGC), (−)-catechin gallate (CG), (−)-epicatechin gallate (ECG), (−)-epigallocatechin gallate (EGCG), (−)-epigallocatechin-3-O-(3-O-methyl)-gallate (EGCG-3ʺMe), and total catechins were in the ranges of 3.4–64.6, 0.5–19.2, 1.1–25.3, 8.4–339.4, 21.4–459.4, 46.8–1003.1, 91.0–619.8, 1.3–43.3, and 206.2–2528.7 μg cm⁻², respectively (Fig. 2). For FAAs, the contents of aspartate (Asp), glutamate (Glu), arginine (Arg), theanine (Thea), and total FAAs were in the ranges of 1.6–59.3, 3.1–49.1, 0.9–346.4, 0.2–264.5, and 12.3–746.0 μg cm⁻², respectively (Fig. 2). Caffeine content was in the range of 1.8–393.1 μg cm⁻² (Fig. 2). The coefficient of variation (CV) in 15 phenotypes was in the range of 33.7%–138.6% (Fig. 2).

Best combination of pre-processing and machine learning algorithms in regression model performance

Using six spectral patterns (OR, FDR, CR, SNV, MSC, and DT) and five machine learning algorithms, Random Forest (RF), Support Vector Machine (SVM), Cubist, Stochastic Gradient Boosting (SGB), and Kernel-based Extreme Learning Machine (KELM), we performed regression modelling for 15 phenotypes of tea quality-related metabolites (Supplementary Fig. S1). Model performances in the combination of pre-processing and machine learning algorithms were evaluated based on the ratio of performance to deviation (RPD) values and robustness over 100 repetitions (Supplementary Table S2). In most phenotypes, the combination of DT and Cubist (DT-Cubist) was selected most often as the best performing combination in each round among the 100 repetitions (Table 1, Supplementary Table S2). The model performance based on DT-Cubist was different between the 15 phenotypes (Fig. 3A; two-way ANOVA, P < 0.001). Except for CG and EGCG-3ʺMe, the mean RPD values in most of them were above the acceptable threshold (RPD = 1.4)³⁸. In GC, EC, ECG, EGC, total catechins, Asp, and total FAAs, the mean RPD values were above the accurate threshold (RPD = 2.0)³⁸. The modelling based on DT-Cubist significantly increased model performance over that based on OR-Cubist (Fig. 3A; two-way ANOVA, P < 0.001). These results were also supported by the root-mean-square error (RMSE) values and the coefficient of determination (R²) values as a model performance index (Fig. 3B, Table 2).

Table 1 Best combination of pre-processing and machine learning algorithms after 100 repetitions.

Full size table

Table 2 Summary of validation and prediction performance based on DT-Cubist in 15 phenotypes for tea quality-related metabolites.

Full size table

Detection of important hyperspectral regions by DSA

Data-based sensitivity analysis (DSA) was performed to detect important hyperspectral regions in models to estimate tea quality-related metabolites, and their results based on OR-Cubist and DT-Cubist were visualized at 50-nm intervals (Fig. 4). Different shapes of DSA plots were observed for caffeine and individual catechins and amino acids (Fig. 4). For catechins without CG and EGCG-3ʺMe that showed poor prediction performance, the peak region consisting of high importance values was observed around 2000 nm (Fig. 4). For amino acids, the peak region of high importance values was around 1500 nm and 2000 nm (Fig. 4), and that for caffeine was around 750 nm and 1350 nm (Fig. 4).

Discussion

To enable the non-destructive estimation of FAAs, catechins, and caffeine as tea quality-related metabolites, we performed regression modelling by combining the VIS–NIR-SWIR (400–2500 nm) hyperspectral reflectance data and machine learning algorithms. Datasets of hyperspectral data and tea quality-related metabolite contents were obtained from approximately 200 new leaves grown under different N conditions in hydroponics or from shading cultivations. The data showed wide variation that the CV in 15 phenotypes was in the range of 33.7%–138.6% (Fig. 2). The CV of EGCG (33.7%), ECG (64.7%), EGC (66.8%), EC (78.8%), caffeine (37.4%) in this study were higher than these (EGCG, 24.2%; ECG, 24.3%; EGC, 34.7%; EC, 14.0%; caffeine, 16.7%) in the previous study³⁵. These results indicate that present datasets are suitable for robust regression modelling.

We applied five pre-processing techniques (Fig. 1; FDR, CR, DT, MSC, and SNV) to the OR data to enhance the more chemically associated peaks by reducing noise from spectral data and the effects of baseline shifts and overall curvature over the OR. Then we compared the model performance in the combination of six spectral patterns (OR, FDR, CR, SNV, MSC, and DT) and five machine learning algorithms (RF, SVM, Cubist, SGB, and KELM) based on the RPD values and robustness over 100 repetitions (Supplementary Table S2). In most phenotypes, the combination of DT and Cubist (DT-Cubist) was selected most often as the best performing combination in each round among the 100 repetitions (Table 1, Supplementary Table S2). DT has been used to correct wavelength-dependent scattering effects and to account for the variation in baseline shift and curvilinearity by fitting a second-degree polynomial through each spectrum³⁹. Therefore, these results suggest that pre-processing based on DT was effective in improving accuracies when VIS–NIR–SWIR (400–500 nm) hyperspectral reflectance data from plant leaves were applied to the regression modelling. Cubist algorithms can generate so-called committee models that consist of a set of consecutive rule-based models to correct the predictions of previous member models⁴⁰; this approach is computationally efficient and well suited to big data analytics⁴⁰. Cubist is better equipped to handle extrapolations out of range of the training target data by relying on a rule-based multivariate linear regression model rather than an ensemble of decision trees with interconnected leaves associated with rigid target predictions⁴¹. Furthermore, Cubist algorithms achieved the best performance in a comparison of a large collection composed of 77 popular regression models⁴². Previous studies also showed that the Cubist algorithm had the potential of an efficient model algorithm for various plant traits using reflectance data such as leaf area index⁴³. Our previous study also showed that the Cubist algorithm had the best regression performance with VIS–NIR–SWIR (400–2500 nm) hyperspectral reflectance data and the contents of N and chlorophyll in tea leaves³⁷. These results and previous studies strongly show that the combination of the pre-processing technique based on the DT-Cubist algorithm was suitable for regression modelling of the VIS–NIR–SWIR reflectance data in plants.

These regression models based on DT-Cubist archived that the mean RPD values in most of the 15 phenotypes were above the acceptable threshold (RPD = 1.4)³⁸ except for CG and EGCG-3ʺMe (Fig. 3A). For catechins and caffeine, the mean RPD values of GC, EC, ECG, EGC, and total catechins were above the accurate threshold (RPD = 2.0)³⁸, but those of EGCG and caffeine were not (Fig. 3A). A previous study based on NIR analysis of ground tea leaves indicated that the calibration models for caffeine, EGC, C, EGCG, EC, ECG, and total catechins, except for GC and EGCG-3ʺMe, had high performance with high R² (more than 0.90)³⁴. The model’s performance for EGCG and caffeine differs from that of other catechins in this study, and these may not be caused by chemical properties. In the dataset for our modelling, the CV values of EGCG and caffeine were drastically lower than those for other catechins (Fig. 2). These low variations in the reference dataset of EGCG and caffeine could have affected the regression modelling performance. Our model performance (R² = 0.50 − 0.86) was inferior to that (R² = 0.89 − 0.94) of the report of Huang et al.³⁵ that also performed the regression modelling based on 400 − 2498 nm reflectance for some catechins and caffeine content in fresh tea new leaves. Although Huang et al.³⁵ acquired the reflectance data using a near-infrared spectrometer under a dark environment in the room, we non-destructively did use a leaf clipping unit on the site under a field condition that could also cause some effect of spectral noise. These differences in measurement methods may affect the prediction performance. However, our measurement method was more designed to be applied in actual agricultural fields. In the previous work of Lee et al.³⁴ and in this study, the estimation of EGCG-3ʺMe was low (Fig. 2). The EGCG-3ʺMe content in the cultivars, Benifuuki, Benifuji, and Benihomare was drastically higher than the other tea cultivars⁴⁴, including Yabukita, which was used in this study. Adding these data for high-EGCG-3ʺMe-content cultivars to the reference data would expand the data variation and possibly improve model performance.

The contributions of hyperspectral regions to generate the regression models for tea quality-related metabolite contents were detected using DSA. The different shapes of DSA plots based on OR-Cubist and DT-Cubist were observed for caffeine and individual catechins and amino acids (Fig. 4). These results suggest that the machine learning algorithms separately determine the variable contributions of important spectral regions to estimate each metabolite. In most catechins, the peak region consisting of high importance values was observed around 2000 nm by DSA (Fig. 4). These results overlapped with spectral regions of known absorption features associated with phenolic compounds and the bending and stretching of C–H and O–H bonds^45,46,47. In amino acids, the peak regions of high importance were observed around 1500 nm and 2000 nm by DSA (Fig. 4). These results were also consistent with previously reported spectral regions (e.g., 1520–1523 nm) for amino acid estimation⁴⁵. DSA based on DT reflected the importance of these regions more than the other pre-processing patterns (Fig. 4, Supplementary Fig. S2). NIR and SWIR spectra in fresh leaf exhibit confounding factors in water absorption regions (approximately 1350–1450 and 1850–1975 nm) that may mask optical chemical features^48,49,50. Our dataset also indicated that many catechins and FAAs contents were negatively and positively correlated with water content, respectively (Supplementary Figs. S3, S4). Although each metabolite in fresh tea leaves may be affected by the water content, the relationship between the model performance and the correlation of each metabolite and the water content was inconsistent (Fig. 3, Supplementary Figs. S3, S4), which indicates that the prediction model in this study has been constructed with an optimized model that takes into account the water content in fresh leaves.

The results of the present study suggest that spectroscopic analyses based on VIS–NIR–SWIR (400–2500 nm) hyperspectral reflectance data and machine learning algorithms have good potential to non-destructively estimate the contents of FAAs, catechins, and caffeine as tea quality-related metabolites in new fresh leaves (Table 2). Our modelling approaches also indicate that pre-processing techniques help to improve the accuracy of model performance. In particular, the combination of DT pre-processing methods and Cubist algorithms showed the highest model performance for most tea quality-related metabolites. These findings will contribute to the non-destructive real-time diagnosis of metabolite levels in tea cultivation management and breeding programs.

Methods

Plant materials

To obtain the dataset of tea quality-related metabolites contents with variations, a series of four experiments (Exp. 1 to Exp. 4) were conducted as described by Yamashita and Sonobe et al.³⁷. New leaves were plucked from each experiment, and its reflectances were measured in site under a field condition. The reflectance datasets of these experiments were also used in our previous study³⁷.

Exps. 1 and 2 were conducted based on hydroponic nutrient tests. One-year-old rooted tea cuttings of cv. Yabukita, a popular and leading Japanese cultivar for green tea, were used in the hydroponic cultures that were conducted under ambient light conditions in an unheated greenhouse (120 m²) at Shizuoka University (Shizuoka, Japan). A minor modification of the culture method described by Konishi et al. (1985) was used. Exp. 1 was conducted based on different six nitrogen (N) nutrient amount conditions using three to five biological replicates: 0 × N, 0.01 × N, 0.1 × N, 1 × N (40 mg L⁻¹), 2 × N, 4 × N. After approximately 6 months of treatment, one or two new leaves were plucked from one individual. Exp. 2 was conducted based on low-light conditions (85% shading) and different four N nutrient amount conditions using three biological replicates: 0 × N, 0.1 × N, 1 × N, 4 × N. After 23 days for treatment, one or two new leaves were plucked from one individual.

Exp. 3 was conducted using mature tea plants (ridges) of cv. Yabukita at Shizuoka University (Shizuoka, Japan) based on low-light conditions (85% shading). New leaves in each leaf-stage were plucked from approximately random 15 shoots in sunlight and shaded tea ridges, and a total 87 leaves In Exp. 4, new leaves in each leaf-stage were plucked from approximately 20 shoots in a 7-year-old rooted tea cutting of a Japanese albino cultivar cv. Koganemidori, which had been bred from the natural etiolated bud sport, in hydroponics.

Finally, 215, 201, and 201 leaves samples in each experiment were freeze-dried, grounded into a fine powder, and then analyzed for free amino acids (FAAs), catechins, and caffeine, respectively.

Reflectance measurements and pre-processing

Reflectance data in new leaves was measured by an ASD FieldSpec4 unit (Analytical Spectral Devices, Boulder, CO, USA) with a leaf clipping (diameter 20 mm) (Supplementary Fig. S5). The widest part in the center of the leaf was measured three times so that a leaf clipping could fit inside the leaf and the average value of that was taken as the representative for each leaf. This spectroscopy contained three detectors, visible (VIS) and near-infrared (NIR), short-wave infrared (SWIR), and SWIR 2. ViewSpec Pro Software (Analytical Spectral Devices) was used to correct differences in the spectral drifts at 1000 and 1800 nm caused by inherent variation in these detector sensitivities. Finally, OR data were recorded with a sampling resolution of 1 nm steps across the entire wavelength domain from 400 to 2500 nm. Five pre-processing methods were also tested based on their success in previous studies, namely first FDR, CR, SNV, MSC, and DT. FDR is effective in reducing baseline variation and increasing the resolution of spectral peak features^51,52. CR is a brightness normalization technique that has been applied to enhance related changes⁵³. MSC and SNV have also been used to eliminate the effect of noise, baseline drift, and light scattering of the spectrogram^54,55,56. DT has been used to correct wavelength-dependent scattering effects and accounts for the variation in baseline shift and curvilinearity by fitting a second-degree polynomial through each spectrum³⁹. All methods were performed using R version 3.6.3 and the R package “prospectr” ver. 0.2.0.

Measurement of tea quality-related metabolites

Catechins and caffeine contents were measured according to the methods described by Horie et al.⁵⁷ and Yamashita et al.⁵⁸. Dry ground leaf tissue (25 mg) was added to 5 mL of 50% (v/v) acetonitrile and shook with 130 strokes min⁻¹ for 60 min at room temperature. The suspended samples were centrifuged at 2000×g for 15 min at 4 °C, and then the supernatants were individually passed through 0.45-µm polytetrafluoroethylene filters (Advantec, Tokyo, Japan). The resulting solutions were stored at − 30 °C until they were analyzed by HPLC as described by Yamashita and Uchida et al.⁵⁸. The eight catechins, GC, C, CG, EC, ECG, EGC, EGCG, EGCG-3ʺMe, and caffeine were quantified. Their total value without caffeine was also expressed as total catechins.

The FAAs contents were measured according to the method described by Goto et al.⁵⁹ and Yamashita et al.⁵⁸. Dry ground leaf tissue (10 mg) was added to 10 mg of polyvinylpolypyrrolidone and 5 mL of ultra-pure water and was shook with 130 strokes min⁻¹ for 60 min at room temperature. The suspended samples were centrifuged at 2000×g for 15 min at 4 °C, and then the supernatants were individually passed through 0.45-µm cellulose acetate filters (Advantec). The resulting solution was stored at − 30 °C until analysis by HPLC as described by⁵⁸. Nine amino acids [Asp, asparagine (Asn), Glu, glutamine (Gln), serine (Ser), Arg, alanine (Aln), Thea, and γ-aminobutyric acid (GABA)] were quantified. Their total value was also expressed as total FAAs.

Regression models based on machine learning algorithms

The regression modelling was conducted as described by Yamashita and Sonobe et al.³⁷ with minor modification and its flow chart was shown in Supplementary Fig. S1. For modelling, a stratified random sampling approach was applied, for which strata were formed based on experiments and treatments, and then all measurements were divided into three dataset groups as follows; a training set (50%), which was used to fit the models; a validation set (25%), which was used to estimate the prediction error for model selection; and a test set (25%), which was used for assessing the generalization error in the final selected model. To evaluate the robustness of models, this flow was repeated 100 times before pre-processing the OR and generating regression models.

When performing regression modelling based on machine learning algorithms, a genetic algorithm (GA)-based approach was applied to select wavelengths using the “ga_pls” function (with the parameter “GA.threshold” and others set as 50 and default values, respectively) of the R package “plsVarSel” ver. 0.9.6. and R ver. 3.6.3. GA were effective for removing noninformative wavelengths to construct simpler and better prediction models. Regression models were then constructed from the selected wavelengths using the following representative five algorithms: RF, SVM, Cubist, SGB, and KELM. The overviews of these five algorithms were described in Supplementary Table S1.

RF was performed and optimized with the five hyperparameters by the R package “randomForestSRC” ver. 2.9.3. SVM was performed with the Gaussian radial basis function kernel and optimized with the two hyperparameters by the R package “e1071” ver. 1.5-8. Cubist was performed and optimized with the two hyperparameters by the R package “Cubist” ver. 0.2.3. SGB was performed and optimized with the four hyperparameters by the R package “gbm” ver. 2.1.5. KELM was performed and optimized with the two hyperparameters by the MATLAB and Statistics Toolbox Release 2016a (MathWorks, Natick, MA, USA; source code downloaded from https://www.ntu.edu.sg/home/egbhuang/). The optimizations in the hyperparameters of these machine learning algorithms were conducted based on the Bayesian optimization approach that was applied with the Gaussian process^60,61 using the R package “rBayesianOptimization” ver. 1.1.0. The hyperparameters information of these algorithms is shown in Supplementary Table S1.

The validation (v) and prediction (p) accuracy of constructed models was assessed based on the following three indexes: the ratio of performance to deviation (RPD), the coefficient of determination (R²), root-mean-square error (RMSE). The performance of the prediction model was assessed according to the following three classes of RPD^38,62,63: RPD > 2, accurate prediction; RPD of 1.4–2, acceptable prediction; RPD < 1.4, poor prediction.

Data-based sensitivity analysis (DSA)

To extract human-understandable knowledge from supervised learning black box data mining models, we performed the DSA^64,65 by using the “Importance” function of the R package “rminer” ver. 1.4.5, as previously described by Yamashita and Sonobe et al.³⁷. Although DSA is similar to a computationally efficient one-dimensional sensitivity analysis⁶⁴, this method uses several training samples instead of a baseline vector⁶⁵ and it could be applied to black-box functions by querying the fitted models with sensitivity samples and recording their responses.

References

Dixon, R. A. & Strack, D. Phytochemistry meets genome analysis, and beyond. Phytochemistry 62, 815–816 (2003).
Article CAS PubMed Google Scholar
Afendi, F. M. et al. Data mining methods for omics and knowledge of crude medicinal plants toward big data biology. Comput. Struct. Biotechnol. J. 4, e201301010 (2013).
Article PubMed PubMed Central Google Scholar
Weng, J.-K. The evolutionary paths towards complexity: A metabolic perspective. New Phytol. 201, 1141–1149 (2014).
Article PubMed Google Scholar
Rai, A., Saito, K. & Yamazaki, M. Integrated omics analysis of specialized metabolism in medicinal plants. Plant J. 90, 764–787 (2017).
Article CAS PubMed Google Scholar
Fang, C., Fernie, A. R. & Luo, J. Exploring the diversity of plant metabolism. Trends Plant Sci. 24, 83–98 (2019).
Article CAS PubMed Google Scholar
Fernie, A. R., Trethewey, R. N., Krotzky, A. J. & Willmitzer, L. Metabolite profiling: from diagnostics to systems biology. Nat. Rev. Mol. Cell Biol. 5, 763–769 (2004).
Article CAS PubMed Google Scholar
Wolfender, J.-L., Nuzillard, J.-M., van der Hooft, J. J. J., Renault, J.-H. & Bertrand, S. Accelerating metabolite identification in natural product research: Toward an ideal combination of liquid chromatography-high-resolution tandem mass spectrometry and NMR profiling, in silico databases, and chemometrics. Anal. Chem. 91, 704–742 (2019).
Article CAS PubMed Google Scholar
Carter, G. A. & Knapp, A. K. Leaf optical properties in higher plants: Linking spectral characteristics to stress and chlorophyll concentration. Am. J. Bot. 88, 677–684 (2001).
Article CAS PubMed Google Scholar
Slaton, M. R., Raymond Hunt, E. & Smith, W. K. Estimating near-infrared leaf reflectance from leaf structural characteristics. Am. J. Bot. 88, 278–284 (2001).
Article CAS PubMed Google Scholar
Xiaobo, Z., Jiewen, Z., Povey, M. J. W., Holmes, M. & Hanpin, M. Variables selection methods in near-infrared spectroscopy. Anal. Chim. Acta 667, 14–32 (2010).
Article PubMed CAS Google Scholar
Türker-Kaya, S. & Huck, C. W. A review of mid-infrared and near-infrared imaging: Principles, concepts and applications in plant tissue analysis. Molecules 22, 1 (2017).
Article CAS Google Scholar
Nicolaï, B. M. et al. Nondestructive measurement of fruit and vegetable quality by means of NIR spectroscopy: A review. Postharvest Biol. Technol. 46, 99–118 (2007).
Article Google Scholar
Liu, Y., Gao, R.-J. & Sun, X.-D. Review of portable NIR instruments for detecting fruit interior quality. Spectrosc. Spectr. Anal. 30, 2874–2878 (2010).
CAS Google Scholar
Prevolnik, M. et al. Accuracy of near infrared spectroscopy for prediction of chemical composition, salt content and free amino acids in dry-cured ham. Meat Sci. 88, 299–304 (2011).
Article CAS PubMed Google Scholar
Behmann, J., Mahlein, A.-K., Rumpf, T., Römer, C. & Plümer, L. A review of advanced machine learning methods for the detection of biotic stress in precision crop protection. Precis. Agric. 16, 239–260 (2015).
Article Google Scholar
Chlingaryan, A., Sukkarieh, S. & Whelan, B. Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Comput. Electron. Agric. 151, 61–69 (2018).
Article Google Scholar
Van Wittenberghe, S. et al. Gaussian processes retrieval of leaf parameters from a multi-species reflectance, absorbance and fluorescence dataset. J. Photochem. Photobiol. B 134, 37–48 (2014).
Article PubMed CAS Google Scholar
Panda, S. S., Ames, D. P. & Panigrahi, S. Application of vegetation indices for agricultural crop yield prediction using neural network techniques. Remote Sensing 2, 673–696 (2010).
Article ADS Google Scholar
Zhang, L. et al. Chemistry and biological activities of processed camellia sinensis teas: A comprehensive review. Compr. Rev. Food Sci. Food Saf. 18, 1474–1495 (2019).
Article CAS PubMed Google Scholar
Fukai, K., Ishigami, T. & Hara, Y. Antibacterial activity of tea polyphenols against phytopathogenic bacteria. Agric. Biol. Chem. 55, 1895–1897 (1991).
CAS Google Scholar
Bors, W. & Saran, M. Radical scavening by flavonoid antioxidants. Free Radic. Res. Commun. 2, 289–294 (1987).
Article CAS PubMed Google Scholar
Ekborg-ott, K. H., Taylor, A. & Armstrong, D. W. Varietal differences in the total and enantiomeric composition of theanine in tea. J. Agric. Food Chem. 45, 353–363 (1997).
Article CAS Google Scholar
Narukawa, M., Morita, K. & Hayashi, Y. L-Theanine elicits an umami taste with inosine 5′-monophosphate. Biosci. Biotechnol. Biochem. 72, 3015–3017 (2008).
Article CAS PubMed Google Scholar
Lu, K. et al. The acute effects of L-theanine in comparison with alprazolam on anticipatory anxiety in humans. Hum. Psychopharmacol. 19, 457–465 (2004).
Article CAS PubMed Google Scholar
Yokogoshi, H. et al. Reduction effect of theanine on blood pressure and brain 5-hydroxyindoles in spontaneously hypertensive rats. Biosci. Biotechnol. Biochem. 59, 615–618 (1995).
Article CAS PubMed Google Scholar
Iso, H., Wakai, K., Fukui, M. & Tamakoshi, A. The relationship between green tea and total caffeine intake and risk for self-reported type 2 diabetes among Japanese adults. Ann. Intern. Med. 144, 554–562 (2006).
Article PubMed Google Scholar
Chou, T. M. & Benowitz, N. L. Caffeine and coffee: Effect on health and cardiovascular disease. Comp. Biochem. Physiol. C. 109, 173–189 (1994).
CAS PubMed Google Scholar
Miyauchi, S. et al. High-quality green tea leaf production by artificial cultivation under growth chamber conditions considering amino acids profile. J. Biosci. Bioeng. 118, 710–715 (2014).
Article CAS PubMed Google Scholar
Yang, X. R., Ye, C. X., Xu, J. K. & Jiang, Y. M. Simultaneous analysis of purine alkaloids and catechins in Camellia sinensis, Camellia ptilophylla and Camellia assamica var. kucha by HPLC. Food Chem. 100, 1132–1136 (2007).
Article CAS Google Scholar
Horie, H., Mukai, T. & Kohata, K. Simultaneous determination of qualitatively important components in green tea infusions using capillary electrophoresis. J. Chromatogr. A 758, 332–335 (1997).
Article CAS Google Scholar
Kotani, A., Takahashi, K., Hakamata, H., Kojima, S. & Kusu, F. Attomole catechins determination by capillary liquid chromatography with electrochemical detection. Anal. Sci. 23, 157–163 (2007).
Article PubMed Google Scholar
Goto, T. Studies on NIR analyses of the chemical components in fresh tea leaf and crude tea and the evaluation of tea quality. Tea Res. J. 1992, 51–61 (1992).
Article Google Scholar
Schulz, H., Engelhardt, U. H., Wegent, A., Drews, H. & Lapczynski, S. Application of near-infrared reflectance spectroscopy to the simultaneous prediction of alkaloids and phenolic substances in green tea leaves. J. Agric. Food Chem. 47, 5064–5067 (1999).
Article CAS PubMed Google Scholar
Lee, M.-S., Hwang, Y.-S., Lee, J. & Choung, M.-G. The characterization of caffeine and nine individual catechins in the leaves of green tea (Camellia sinensis L.) by near-infrared reflectance spectroscopy. Food Chem. 158, 351–357 (2014).
Article CAS PubMed Google Scholar
Huang, Y. et al. Development of simple identification models for four main catechins and caffeine in fresh green tea leaf based on visible and near-infrared spectroscopy. Comput. Electron. Agric. 173, 105388 (2020).
Article Google Scholar
Wang, Y.-J. et al. Onsite nutritional diagnosis of tea plants using micro near-infrared spectrometer coupled with chemometrics. Comput. Electron. Agric. 175, 105538 (2020).
Article Google Scholar
Yamashita, H., Sonobe, R., Hirono, Y., Morita, A. & Ikka, T. Dissection of hyperspectral reflectance to estimate nitrogen and chlorophyll contents in tea leaves based on machine learning algorithms. Sci. Rep. 10, 17360 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Chang, C.-W., Laird, D. A., Mausbach, M. J. & Hurburgh, C. R. Near-infrared reflectance spectroscopy-principal components regression analyses of soil properties. Soil Sci. Soc. Am. J. 65, 480–490 (2001).
Article CAS ADS Google Scholar
Barnes, R. J., Dhanoa, M. S. & Lister, S. J. Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Appl. Spectrosc. 43, 772–777 (1989).
Article CAS ADS Google Scholar
Walton, J. T. Subpixel urban land cover estimation. Photogramm. Eng. Remote Sens. 74, 1213–1222 (2008).
Article Google Scholar
Houborg, R. & McCabe, M. F. A hybrid training approach for leaf area index estimation via Cubist and random forests machine-learning. ISPRS J. Photogramm. Remote Sens. 135, 173–188 (2018).
Article ADS Google Scholar
Fernández-Delgado, M. et al. An extensive experimental survey of regression methods. Neural Netw. 111, 11–34 (2019).
Article PubMed Google Scholar
Johnson, D. M. An assessment of pre- and within-season remotely sensed variables for forecasting corn and soybean yields in the United States. Remote Sens. Environ. 141, 116–128 (2014).
Article ADS Google Scholar
Sano, M. et al. Simultaneous determination of twelve tea catechins by high-performance liquid chromatography with electrochemical detection. Analyst 126, 816–820 (2001).
Article CAS PubMed ADS Google Scholar
Bian, M. et al. Predicting foliar biochemistry of tea (Camellia sinensis) using reflectance spectra measured at powder, leaf and canopy levels. ISPRS J. Photogramm. Remote Sens. 78, 148–156 (2013).
Article ADS Google Scholar
Kokaly, R. F. & Skidmore, A. K. Plant phenolics and absorption features in vegetation reflectance spectra near 1.66 μm. Int. J. Appl. Earth Obs. Geoinf. 43, 55–83 (2015).
ADS Google Scholar
Couture, J. J. et al. Spectroscopic determination of ecologically relevant plant secondary metabolites. Methods Ecol. Evol. 7, 1402–1412 (2016).
Article Google Scholar
Curran, P. J., Dungan, J. L., Macler, B. A., Plummer, S. E. & Peterson, D. L. Reflectance spectroscopy of fresh whole leaves for the estimation of chemical concentration. Remote Sens. Environ. 39, 153–166 (1992).
Article ADS Google Scholar
Gao, B.-C. & Goetz, A. F. H. Extraction of dry leaf spectral features from reflectance spectra of green vegetation. Remote Sens. Environ. 47, 369–374 (1994).
Article ADS Google Scholar
Ramoelo, A., Skidmore, A. K., Schlerf, M., Mathieu, R. & Heitkönig, I. M. A. Water-removed spectra increase the retrieval accuracy when estimating savanna grass nitrogen and phosphorus concentrations. ISPRS J. Photogramm. Remote Sens. 66, 408–417 (2011).
Article ADS Google Scholar
Tsai, F. & Philpot, W. Derivative analysis of hyperspectral data. Remote Sens. Environ. 66, 41–51 (1998).
Article ADS Google Scholar
Sun, X., Subedi, P., Walker, R. & Walsh, K. B. NIRS prediction of dry matter content of single olive fruit with consideration of variable sorting for normalisation pre-treatment. Postharvest. Biol. Technol. 163, 111140 (2020).
Article CAS Google Scholar
Clark, R. N. & Roush, T. L. Reflectance spectroscopy: Quantitative analysis techniques for remote sensing applications. J. Geophys. Res. 89, 6329–6340 (1984).
Article CAS ADS Google Scholar
Maleki, M. R., Mouazen, A. M., Ramon, H. & De Baerdemaeker, J. Multiplicative scatter correction during on-line measurement with near infrared spectroscopy. Biosyst. Eng. 96, 427–433 (2007).
Article Google Scholar
Genkawa, T. et al. Baseline Correction of Diffuse Reflection Near-Infrared Spectra Using Searching Region Standard Normal Variate (SRSNV). Appl. Spectrosc. 69, 1432–1441 (2015).
Article CAS PubMed ADS Google Scholar
Ren, G., Sun, Y., Li, M., Ning, J. & Zhang, Z. Cognitive spectroscopy for evaluating Chinese black tea grades (Camellia sinensis): Near-infrared spectroscopy and evolutionary algorithms. J. Sci. Food Agric. 100, 3950–3959 (2020).
Article CAS PubMed Google Scholar
Horie, H., Maeda-Yamamoto, M., Ujihara, T. & Kohata, K. Extraction of tea catechins for chemical analysis. Tea Res. J. 2002, 60–64 (2002).
Article Google Scholar
Yamashita, H. et al. Genomic predictions and genome-wide association studies based on RAD-seq of quality-related metabolites for the genomics-assisted breeding of tea plants. Sci. Rep. 10, 17480 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Goto, T., Horie, H. & Mukai, T. Analysis of major amino acids in green tea by high-performance liquid chromatography coupled with OPA precolumn derivatization. Tea Res. J. 1993, 29–33 (1993).
Google Scholar
Bergstra, J. & Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012).
MathSciNet MATH Google Scholar
Snoek, J. et al. Scalable Bayesian Optimization Using Deep Neural Networks. In: International Conference on Machine Learning, pp. 2171–2180 (jmlr.org, 2015).
Du, C. et al. Determination of soil properties using Fourier transform mid-infrared photoacoustic spectroscopy. Vib. Spectrosc. 49, 32–37 (2009).
Article CAS Google Scholar
Razakamanarivo, R. H., Grinand, C., Razafindrakoto, M. A., Bernoux, M. & Albrecht, A. Mapping organic carbon stocks in eucalyptus plantations of the central highlands of Madagascar: A multiple regression approach. Geoderma 162, 335–346 (2011).
Article CAS ADS Google Scholar
Kewley, R. H., Embrechts, M. J. & Breneman, C. Data strip mining for the virtual design of pharmaceuticals with neural networks. IEEE Trans. Neural Netw. 11, 668–679 (2000).
Article CAS PubMed Google Scholar
Cortez, P. & Embrechts, M. J. Using sensitivity analysis and visualization techniques to open black box data mining models. Inf. Sci. 225, 1–17 (2013).
Article Google Scholar

Download references

Acknowledgements

We thank Mr. Hiromitsu Sato for providing rooted tea cutting cv. Koganemidori. This research was supported by the Agriculture, Forestry and Fisheries Research Council (No. 1919102; R.S., Y.H., A.M., and T.I.), the Japanese Society for the Promotion of Science (Grant-in-Aid for Scientific Research No. 19K06313; R.S. and Y.H., No. 20J10182; H.Y.), and the ESPEC Foundation for Global Environment Research and Technology (Charitable Trust; H.Y.). We thank Austin Schultz, PhD, from Edanz Group (https://en-author-services.edanzgroup.com/ac) for editing a draft of this manuscript.

Author information

Authors and Affiliations

Faculty of Agriculture, Shizuoka University, 836 Ohya, Suruga-ku, Shizuoka, 422-8529, Japan
Hiroto Yamashita, Rei Sonobe, Akio Morita & Takashi Ikka
United Graduate School of Agricultural Science, Gifu University, 1-1 Yanagito, Gifu, 501-1193, Japan
Hiroto Yamashita
Institute for Tea Science, Shizuoka University, 836 Ohya, Suruga-ku, Shizuoka, 422-8529, Japan
Rei Sonobe, Yuhei Hirono, Akio Morita & Takashi Ikka
Division of Tea Research, Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization (NARO), 2769 Shishidoi, Kanaya, Shimada, Shizuoka, 428-8501, Japan
Yuhei Hirono

Authors

Hiroto Yamashita
View author publications
You can also search for this author in PubMed Google Scholar
Rei Sonobe
View author publications
You can also search for this author in PubMed Google Scholar
Yuhei Hirono
View author publications
You can also search for this author in PubMed Google Scholar
Akio Morita
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Ikka
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.Y., R.S., and T.I. designed this study. H.Y., Y.H., A.M., and T.I. managed the tea plants for experiments. H.Y. analyzed the metabolites contents. H.Y. and R.S. measured reflectance and performed modelling. H.Y., R.S., and T.I. performed most data visualization and writing. H.Y., R.S., Y.H., A.M., and T.I acquired funding. All authors read and approved the manuscript.

Corresponding authors

Correspondence to Rei Sonobe or Takashi Ikka.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Table S1.

Supplementary Table S2.

Supplementary Figures.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yamashita, H., Sonobe, R., Hirono, Y. et al. Potential of spectroscopic analyses for non-destructive estimation of tea quality-related metabolites in fresh new leaves. Sci Rep 11, 4169 (2021). https://doi.org/10.1038/s41598-021-83847-0

Download citation

Received: 13 November 2020
Accepted: 09 February 2021
Published: 18 February 2021
DOI: https://doi.org/10.1038/s41598-021-83847-0

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.