Prediction of Moisture Content for Congou Black Tea Withering Leaves Using Image Features and Nonlinear Method

Withering is the first step in the processing of congou black tea. With respect to the deficiency of traditional water content detection methods, a machine vision based NDT (Non Destructive Testing) method was established to detect the moisture content of withered leaves. First, according to the time sequences using computer visual system collected visible light images of tea leaf surfaces, and color and texture characteristics are extracted through the spatial changes of colors. Then quantitative prediction models for moisture content detection of withered tea leaves was established through linear PLS (Partial Least Squares) and non-linear SVM (Support Vector Machine). The results showed correlation coefficients higher than 0.8 between the water contents and green component mean value (G), lightness component mean value (L*) and uniformity (U), which means that the extracted characteristics have great potential to predict the water contents. The performance parameters as correlation coefficient of prediction set (Rp), root-mean-square error of prediction (RMSEP), and relative standard deviation (RPD) of the SVM prediction model are 0.9314, 0.0411 and 1.8004, respectively. The non-linear modeling method can better describe the quantitative analytical relations between the image and water content. With superior generalization and robustness, the method would provide a new train of thought and theoretical basis for the online water content monitoring technology of automated production of black tea.

non-destructive testing of texture characteristics such as shape, size, color and texture of foods and agricultural products. The machine vision system can be used to obtain the images of tea leaves, and then to extract tea leaves' color and texture features. In this way, the surface features of tea leaves can be quantified and accurately described. The feasibility of this method for tea quality evaluation has been proved by practical applications 12 . Based on the image RGB histogram features, Borah S. et al. 13 establish a qualitative assessment method for CTC black tea fermentation appropriateness. Yan Jun 14 presents a qualitative description method according to tea samples' external feature parameters using the standard samples of roasted green tea as study objects. Yudong Zhang, Wang shuihua et al. [15][16][17] designed an automatic tea-category identification (TCI) system to Identification of Green, Oolong and Black Teas, which used a 3-CCD digital camera, and then extracted 64 color histogram features and 16 wavelet packet entropy (WPE) features to obtain color information and texture information, respectively. He Yong, Li Xiaoli et al. 18,19 used multispectral imager to acquire the tea leaf images under certain wavelength; with this method, the shape features and gray-level co-occurrence matrix texture features were obtained, which were then used to determine green tea's classifications. Tang Zhe et al. 20 presented an identification method for the tenderness of fresh tea leaves using texture features and support vector machine.
However, machine vision is rarely used in the research of black tea's withering process and moisture detection 3,8,12 . Gejima Y. N. et al. 21 used vision technology to monitor the color changes during the de-enzyme process, and to establish a model of image color features and moisture loss rate. Li Jie et al. 22 made an image acquisition device with camera and bellows; they also used Photoshop filter fuzzy algorithm to establish a new method of fetching unbroken tea color with computer, named L Method, extract the mean values of the L * , a * , and b * color channel from the images of fresh tea leaves; in this way, they obtained the color changes along with the withering level. Scanned with resolution 100 dpi, the credible interval (95%) of the result is: L * = 0, −0.45 ≤ a * ≤ 0.45, −0.71 ≤ b * ≤ 0.71. When the scanned area is 11 cm × 11 cm or more, the color parameters of tea scanning photo may fully stand for the tea actual color. The SE is: −0.16 ≤ L * ≤ 0.16, −0.13 ≤ a * ≤ 0.13, b * = 0. Hua, Jiang et al. 23 studied the color aberration (L * , a * , b * value) changes during black tea's withering process under different ambient humidity conditions.
The changes of fresh tea leaves' external shape characteristics during the withering process can be classified into three parts: color change, the surface smoothness and shrinking degree of the tea leaves. The current researches are focusing on tea leaves' surface color changes during withering, which ignore the external shape texture features caused by moisture loss. But texture features are important indicators of tea leaves' shape changes, such as the texture structure wrinkle and shrink. Besides, the correlation between image features and moisture loss rate, and the quantitative analytical relation are still undefined. The aim of this research is to realize quick and nondestructive examination of moisture content during tea processing. To achieve this goal, image acquisition system is applied in this research to obtain the visible light image of withering leaves, and to extract the texture and color features; linear method of PLS and BN and nonlinear method of BP-ANN, SVM and RF are used for the establishment of moisture quantitative characterization model for withering tea leaves. In this research, the performances of models are analyzed through comparison to explore the relations between color, texture features, and moisture. Finally, a new evaluation model with high precision and strong generalization is obtained. This research can provide theoretical support for the feedback control technology and the online moisture testing during the automated production of black tea.

Results and Analysis
Correlation analysis on visual features and moisture content. A correlation analysis on the sample moisture content and image feature variables was conducted. The results are shown as  Table 1. Correlation between image feature values and moisture content. Note: ** Correlation is significant at the 0.01 level, * Correlation is significant at the 0.05 level.
with red component mean value (R), green component mean value (G), saturation mean value (S), visible light mean value (V), b component mean value (b * ), lightness component mean value (L * ) and significantly negatively correlated with a component mean value (a * ). It indicated that, the higher the moisture content, the brighter and greener the leaves, the more intense the color. During the correlation analysis on texture and moisture content, except third moment (μ 3 ) value, all the feature variables were significantly correlated with moisture content (uniformity (U) is significantly negatively correlated), which showed that the surface morphology and structure of the leaves changed greatly during the withering process. In addition, the image feature variables were also correlated with each other to some extent. For example, the correlation between R and V was as much as 0.987. The correlation between these variables would cause large information overlap and overfitting of the established model, as well as excellent fitting performances of the calibration set but inferior external calibration 24 . Hence, in further modeling, dimensional reduction of principle components should be conducted on image feature variables, which could eliminate collinearity and reduce machine learning time when retaining image information correlated with moisture content.
Analysis on changes of color and texture features. Generally, the color and texture feature variables extracted were multivariate high-dimensional arrays with different dimensions and orders of magnitude. Hence, in order to investigate the dynamic change rules of image feature variables (significantly correlated with moisture content) during the withering process, this study first conducted Zscore on original data to eliminate the limitation of dimension and order of magnitude. The results were shown as Fig. 1. As shown in Fig. 1(A), with the decrease of moisture content, R, G, V, L * presented a linear decrease, a * presented an upward trend of fast-to-slow, S and b * presented a trend of rapid decrease to slow decrease, U of texture features presented a gradual increase, average gray value (m), standard deviation (δ), smoothness (r), and entropy (e) presented a rapid decrease and then a slow decrease after the moisture content reduced to 60%.

PLS linear correlation model. A correlation model between image features and moisture content was
proposed during the withering period of black tea in order to realize a rapid and nondestructive inspection of moisture content. As a widely used typical mathematical modeling method, PLS could effectively solve the problem of multi-collinearity and raise the accuracy of the model 17 . Figure 2A showed the corresponding RMSEC values of PLS linear models established with different NPC (number of principle components). Figure 2B showed the relations between the predicted values and the measured values. The corresponding NPC of the minimal  Nonlinear correlation model of SVM. Although typical neural network method could deal with the problem of nonlinearity, its randomness and overfitting of the initial values has limited its use. Raised by Vapnik, SVM was a high dimensional information processing tool and a more effective multivariate modeling analytical method with great potentiality. SVM replaces the inner product operation in high-dimensional space by introducing a kernel function and solves the nonlinear fitting problem. Therefore, the selection of the kernel function directly affects the generalization ability of the model. SVM is established on the Li-svmlab toolbox of Matlab 25 , 10 groups of principle components as the input, and moisture content of the withered leaves as the output of the network. Four kinds of kernel functions such as linear, polynomial, Radial Basis Function (RBF), and sigmoid are selected respectively to establish the prediction model of the water content of the withering leaf, and the optimal kernel function was determined by comparison. From the Table S1, the SVM model based on the RBF kernel function has the best predictive performance, and its RPD value is 1.7562, which is higher than the SVM model of other kernel functions.
When SVM chose RBF as its kernel function, considering the influences brought by penalty parameters (c) and kernel function parameters (g) to the modeling results 26 , grid searching technique and cross validation were used to conduct a global optimization on c and g. The optimization process was shown as

Nonlinear model of Random Fores. Random Forest (RF) is a decision tree-based integration algorithm
whose basic unit is a decision tree. When the RF model is built, the number of inputting principal components (PCs) and the number of decision trees (N) directly affect the accuracy of the model, so the PCs and N need to be further optimized (within a certain range of parameters). Twenty N (50 to 1000 in steps of 50) and 10 PCs (1 to 10 in steps of 1) were selected, then the parameters were optimized using the RMSEC values of the model.
The optimization result was shown in Fig. S1. As shown in Fig. S1(A), when PCs = 4, N = 350, the RMSEC of the model has the minimal value of 0.0012 in the TFs prediction model, and Rp, RMSEP, Bias, SEP, CV, and RPD of the prediction set were 0.891, 0.058, −0.007, 0.011, 0.190 and 1.612, respectively. The relations between prediction value and measured value was shown as Fig. S1(B).

Model comparison and discussion. In this part, the performances of linear model (PLS and BN) and
nonlinear models (BP-ANN, SVM and RF) were compared ( Table 2). The results showed that the Rp, RMSEP and Bias of the prediction set of the nonlinear model were obviously better than that of linear model. SVM model has the best prediction performance with RPD > 1.8, which could be used for quantitative analysis of moisture content. The small SEP and CV showed that the model has a small degree of sample deviation and discrete variation with preferable accuracy and prediction capability. In addition, according to Figs 3B and 4B, the prediction scatter distribution of PLS model is rather dispersive, which indicated that the linear regression model is weak in generalization due to the overfitting and couldn't make effective prediction on external independent samples. Compared with PLS model, the prediction scatter of SVM model was more converged to slope.
With the decrease of moisture content during the withering process, the chemical components existed in the fresh leaves are also changing. For example, polyphenols were degraded, with the total amount reduced and oxidation products continuously increased with the strengthening of the withering 3,6 ; under the effect of chlorophyllase, chlorophylls were hydrolyzed to chlorophyllide (Cda, Cdb) and phytol, which induced the changes of colors of tea leaves 2 . Meanwhile, the constantly losing moisture content of the fresh leaves may cause cell collapse and structural damages, and further lead to morphological changes and shrinkage of the leaves 8,23 . Due to the

Materials and Methods
Sample collection. In this research, newly picked fresh tea leaves were collected as experimental materials; the variety of the tea leaf was Jiukeng; the tenderness was 1 bud and 2 leaves. In April 2017 the experiment was conducted at Jiande Qiandao Yinzhen Tea Company. The fresh tea leaves were placed in an artificial climate box for withering; the ambient temperature was set to be 35 °C; the relative humidity was 50%. Samples were taken every 0.5 h during the withering process; for each sampling, 6 groups of samples were collected for image acquisition. The mass of each group was 15 ± 0.5 g which was uniformly tiled in a sample cell of size Φ70 mm, and the sample cell is placed under a uniform light source for image acquisition. Take 3 g from the tea sample after the image collected, and test the moisture content by a moisture analyzer (MA35M-000230V1, Sartorius). The test is repeated 3 times, and the average value of 3 times is taken as the moisture value of the collected image. After 11 hours of withering, the moisture content dropped from the original 77.18% to 39.53%, representing the completion of withering process. Totally, 138 sample images were collected based on 23 time points. The moisture contents of 138 samples were used as the reference values of modeling. 95 effective samples were selected as the training set in accordance with the Mahalanobis distance-based Kennard-Stone (KS) algorithm 29 ; the remaining 43 samples were used as the external validation set (prediction set) 30 .

Machine vision system.
A computer image acquisition system was designed, which consisted of image sensor, sample pool, uniform light, and image software processing system. Image acquisition and data analysis were conducted in line with the technological path in Fig. 4. SLR camera 31 (Canon DS60D, Japan, 18MP) was used as the image sensor: ISO sensitivity was set to 100; image size was set to 3456 × 2304 pixels; Aperture AV. was set to f/4.0; Exposure time AV. was set to 1/30 s. Cambered uniform light with an intensity of 100 lx was selected as the light source. Images were stored in RAW format. DCRaw v 9.17 was used for software decoding with the following options: the white balance was the same as that of photographing; AHD (Adaptive Homogeneity-Directed Demosaic algorithm) was adopted to remove mosaic 32 , RGB values were transformed into sRGB color space and then stored in 16 digits TIFF format. Image processing system (software copyright number of china: 2013SR122183; 2014SR149549) was developed based on Matlab GUI module; GUI program could automatically analyze the color and texture features of the image. Figure 5. showed computer image acquisition system physical drawing.
The process of image acquisition is as follows: When the tea samples were placed under the cambered uniform light, the parameters of the SLR camera were adjusted. Then the images which were acquired by SLR camera were imported into the image processing system, and the software system automatically divided the 2000 × 1000 pixels area and analyzed the color and texture features of images.  Based on the statistic gray co-occurrence matrix 35 , a texture analysis was carried out for the images of withering leaves with different moisture contents. In this step, 6 texture features were obtained, including m, δ, r, μ 3 , U, and e; totally, 15 image feature variables were obtained 18,19,36,37 . Data processing and analysis. Least square method (PLS) and support vector machine (BP-ANN and SVM) were used for linear and non-linear quantitative model 19,[38][39][40][41] , respectively.
Least square method (PLS) was a classical multivariate correction method that finds the best function match of a set of data by minimizing the sum of square s of errors. In the process of establishing the PLS model, the best principal component number was determined by root mean square error of calibration in the calibration set, and the number of principal component of the minimum RMSEC value used by the corresponding model was the number of the best principal component factor.
SVM was a learning system using a linear function to assume a space in a high-dimensional feature space. through a non-linear mapping the data samples were mapped to high-dimensional feature spaces and perform linear regression in this space.
Zscore(standardized conversion) method 42 was adopted to preprocess the original data. Principal component analysis (PCA) was used to extract feature variables from the original data 43 ; the extracted data were then used as the input variables of the model. The parameters of Rc, Rp, RMSEC, RMSEP, Bias, RPD, SEP and CV applied in Literature [44][45][46] were used as the evaluation indicators for the model performance. Usually, the higher the Rp and RPD, the smaller the RMSEP, SEP, CV and Bias, and the higher the prediction performance and accuracy of the model will be 33 .

Conclusion
(1) Based on linear method of PLS and BN and nonlinear method of BP-ANN, SVM and RF, different models based on machine vision for predicting the moisture content during the withering process of tea leaves were established respectively. The results showed that SVM model has better performance than other model, with correlation coefficient between the prediction value and measured value 0.9314, prediction root-mean-square error 0.0411 and RPD > 1.5. This showed that there exist nonlinear relations between moisture content and image features, and nonlinear method could better represent the quantitative analytical relations between image information and moisture content. (2) The significant correlation has been explicated between the moisture content of withered leaves and the multiple color and texture feature variables. The correlation coefficients of parameters G, L and U are more than 0.8, which showed that leaf image features can be used to represent moisture content changes. (3) The study has provided theoretical basis for the development of moisture online monitoring technology and equipment used in automated withering process of Congo black tea. It has a good application prospect, which could be also applied to the processing of other tea species such as the spreading of green tea and withering of oolong tea.