Evaluation of the total volatile basic nitrogen (TVB-N) content in fish fillets using hyperspectral imaging coupled with deep learning neural network and meta-analysis

Recently, hyperspectral-imaging (HSI), as a rapid and non-destructive technique, has generated much interest due to its unique potential to monitor food quality and safety. The specific aim of the study is to investigate the potential of the HSI (430–1010 nm) coupled with Linear Deep Neural Network (LDNN) to predict the TVB-N content of rainbow trout fillet during 12 days storage at 4 ± 2 °C. After the acquisition of hyperspectral images, the TVB-N content of fish fillets was obtained by a conventional method (micro-Kjeldahl distillation). To simplify the calibration models, nine optimal wavelengths were selected by the successive projections algorithm. A seven layers LDNN was designed to estimate the TVB-N content of samples. The LDNN model showed acceptable performance for prediction of TVB-N content of fish fillet (R2p = 0.853; RSMEP = 3.159 and RDP = 3.001). The performance of LDNN model was comparable with the results of previous works. Although, the results of the meta-analysis did not show any significant difference between various chemometric models. However, the least-squares support vector machine algorithm showed better prediction results as compared to the other models (RMSEP: 2.63 and R2p = 0.897). Further studies are required to improve the prediction power of the deep learning model for prediction of rainbow-trout fish quality.


Results and discussion
TVB-N value. Changes in TVB-N values of 210 subsamples (30 fish fillets per day) during storage were presented in Table 1. The initial TVB-N content of the rainbow trout fillets was 8.70 ± 0.86 N/100 g, which significantly increased during storage time and finally reached to 36.79 ± 4.38 N/100 g, which this data is comparable with previous study results for rainbow trout fish fillets [22][23][24][25] .
The threshold limit of acceptability for the TVB-N of rainbow trout, as a freshwater fish is considered 20 N/100g [26][27][28] . Based on this critical value, the acceptable shelf-life for analyzed rainbow trout was 8 days. Furthermore, as shown in Table 1. the variation range of TVB-N for calibration and prediction set were 38.8 and 35.48 mg N/100 g, respectively. Therefore, the differences between the fresh and stale samples were highlighted during 12 days storage which was helpful to establish a suitable and robust calibration model for predicting the total volatile basic nitrogen and consequently estimating shelf-life and quality of rainbow trout fish subsamples 29 .
Spectral feature analysis. The mean reflectance spectra plot (400-1000 nm) of fish fillet with different TVB-N values are illustrated in Fig. 1. The spectral reflectance curves of samples in various storage days followed an almost similar trend. However, overtime, there was an increase in spectral reflectance across the whole investigated waveband range. The amplitude of variation of spectral reflectance was recognizable on the spectra Table 1. Descriptive statistics for TVB-N content of samples, measured by the conventional methods. Different letters indicate significant difference between average of samples (p < 0.05). Overall, the bands in the visible range, 400-700 nm can be connected to the change of fish color. The significant absorption regions were observed around near infrared range (700-1000 nm) may be related to overtones of several chemical bonds, such as N-H (760-840 nm: protein), C-H (930 nm: protein compound), O-H (690-720 nm and 970 nm: water and lipid oxidation compound)and S-H (930 nm: methylene) stretching 8,13,14,32-34 . Optimal wavelength selection. In the current study, the SPA method was used to choose the most important wavebands related to the TVB-N content of fish quality from the full spectral range. Nine wavelengths (459, 552, 616, 629, 695, 760, 896, 956 and 986 nm) were considered as the optimal variable which covered the whole spectral range. Figure 2 showed the frequency of various waveband ranges selected by different methods in previous studies. As seen in Fig. 2 the optimal wavelengths almost covered the full spectral range. More than half of these wavebands (5 out of 9 wavebands) were located in the visible region of the spectrum (400-750 nm). The changes of the chemical compounds (e.g. protein, fat, water, etc.) occurring during freshness loss of fish can directly reflect in fish fillet color and result in the spectral variations in visible region 35 . These results were agreed with several previous studies ( Fig. 2 and Table 3). Moreover, the results of the meta-analysis (Fig. 2) showed that the most frequent waveband range was located at 400-500 (25%) and 501-600 (20%) nm. However, in the current study, most of the selected wavebands were located at the main absorption range of 601-700 nm which is related to the variation of H 2 S produced by microbial activity 36 . The waveband of 890 nm is ascribed to the C-H and N-H stretching that is associated with protein, methylene group of lipid 11 . Water in the fish fillet is the major component and finally, the selected waveband 950 nm is assigned to the second overtone O-H stretching in water and the third overtone C-H and C-H 2 stretching of fat 11,37 . The waveband was observed around 1000 nm (990 nm) which are mainly related to the NH stretching of proteins 11,21 .  spectral imaging (HSI) coupled with linear deep neural network, as a chemometric algorithm, has been applied to evaluate the TVB-N content of rainbow-trout fish fillets. It is still very hard to obtain a big dataset for TVB-N value of fish samples, due to the use of manual, time consuming and destructive information acquisition tools. Therefore, the main characteristic of the available data is the small sample size (210 samples) which restricts the functionality of the machine learning tools which are used for prediction. Therefore, the use of the LDNN model was proposed for resolving this problem. Because by using the linear activation function, the deep learning neural network will not overfit to small sample size and the use of any number of layers was allowed for such DNN to be trained without the concern of overfitting to the data. The performance of this model in calibration, cross-validation and prediction sets for prediction of TVB-N were presented in Table 2. The result showed that the LDNN model exhibited acceptable performance for the prediction of TVB-N content (R 2 p = 0.853; RSMEP = 3.159 and RDP = 3.001). Moreover, in order to better compare the performance of LDNN algorithm, two well-known models including PLSR and LS-SVM were also established. Although, all of the chemometric models exhibited good performance in the prediction of the TVB-N value (0.82 < R 2 p < 0.9 and 2.5 < RDP < 3). In the calibration set, the PLSR and LS-SVM models showed better performance in comparison with LDNN algorithm (lower RMSC and higher R 2 C ). However, in the prediction set, the lowest RMSEP and difference between RMSEP and RMSEC as well as the highest RDP obtained in LDNN model ( Table 2). This can be considered as a reason for more stability of the LDNN model to predict TVB-N values of rainbow trout fillets. Therefore, Deep learning as a state-of-theart method for processing large and complicated datasets, showed a promising performance to resolve regression problems and evaluation of TVB-N value of fish fillets. In this regard, Yu et al., (2017) used HSI combined with a deep learning algorithm (stacked auto-encoders (SAEs)) followed by logistic regression (LR) to classify the fresh and stale shrimp based on TVB-N value during cold storage and reported the results showed that the established SAEs-LR model is satisfactory for discriminating freshness grade of the shrimp (R 2 P = 0.858 and RMSEP = 0.19 and RPD = 2.64) 19 . Yu, Wang, Wen, Yang, and Zhang (2019) also correlated the hyperspectral data (900-1700 nm) for determining total volatile basic nitrogen (TVB-N) content in shrimp. They compared Successive projections algorithm (SPA) and deep-learning-based stacked auto-encoders (SAEs) algorithm to select spectral features. The SAEs-LS-SVM and SPA-LS-SVM showed a suitable performance with RPD values of 3.58 and 3.11 respectively which compared with our findings. Deep learning method can learn representational features from the dataset during the training process, and show stronger ability than traditional methods in the current study (RDP > 3).
Based on Table 3 and Fig. 3, the performance of LDNN model was comparable with the results of previous works established for prediction of TVB-N value of various meat and seafood products based on hyperspectral imaging systems [5][6][7]33 . Figure 3a,b showed the effect of various chemometric algorithms on the predictive power of hyperspectral imaging system. The results of meta-analysis indicated that although linear models averagely showed a higher R 2 P value, the lowest RMSEP was obtained for non-linear model (R 2 P(linear model) = 0.895 ± 0.0417 vs R 2 P(non-linear model) : 0.876 ± 0.0406; RMSEP (linear model) = 2.945 ± 1.36 vs. RMSEP (linear model) = 2.648 ± 1.032; P > 0.05). The result was in agreement with the LDNN finding. The results of the meta-analysis did not show any significant difference between the prediction power of various chemometric models. However, the highest R 2 P and RMSEP value was obtained for PLSR model. Since the LS-SVM model showed a relative high R 2 P beside a lower RMSEP value, this model can be considered as more stable than the PLSR one to estimate TVB-N content on fish and meat products. It should be noted that the performance of a chemometric model is a function of several factors: the number of samples and variables; the optimal waveband selection method; the type of samples and chemical structure of them; the hyperspectral imaging wavelength range and etc. In this regard, Cheng et al. (2016) reported that when the SPA method was used for optimal wavelength selection, the best method was obtained for the LS-SVM model, while the GA method was provided the best performance for the MLR model 33      V 1 is the titration volume for the tested sample (mL); V 2 is the titration volume of blank sample (mL); c is the actual concentration of HCl (mol L −1 ); m is the weight of minced muscle (g) 12,40 . ROI identification and extraction of spectral data. The region of interests (ROIs) of hyperspectral imaging were identified manually by the software of hyperspectral imaging system (LabVIEW 2011, National Instruments CO. Austin, USA). A mean spectrum for each fish fillet was obtained and used as input data for evaluation TVB-N values of the samples based on a trained deep learning algorithm. Savitzky-Golay (S-G) algorithm was used to decline the noises of extracted average spectrum (by: Unscrambler 10.4; CAMO, Trondheim, Norway) 5 .

Materials and methods
Optimal wavelength selection. Data extracted from hyperspectral images of each fish fillet sample comprises hundreds of contiguous wavebands. However, most of these wavelengths are poorly correlated with TVB-N content. Successive projections algorithm (SPA), as a forward selection method, was used to select the most informative wavelengths. This algorithm starts with one waveband and incorporates a new one at each replication until a specified number of wavebands with minimum redundancy is obtained 41 . The procedure of SPA was conducted in Matlab 2016a software (The MathWorks Inc., Mass, USA).

Deep regressor model for TVB-N prediction.
Our objective is to predict the TVB-N value of fish fillets based on deep learning method. The main approaches for regression analysis include analytical methods and neural network (NN) based methods. The former assumes a mathematical equation and aims at finding the optimum parameters for this equation describing the relationship between variables, while the latter train a NN as a black box with the input predictor and the output outcome variables, to estimate their relationship.
Artificial neural network (ANN). An ANN is based on artificial neurons (i.e., a collection of connected units) which showed in Fig. 5.
Each connection can transmit a signal to other neurons. The output y will be computed as 42 : (1) TVB-N mg/100 g fish muscle = In this research, DNNs was used for regression of the TVB-N content of fish fillet by a set of existing data. The advantage of using NNs over analytical methods include their efficient inference step and their power in handling the noise in the data.
The training (learning) phase of NNs is performed considering examples, and without being programmed with task-specific rules.
Network structure. Because of the small sample size, training nonlinear NN models for our problem encounters overfitting to the training data. Therefore, we are restricted to linear regression, to find a line (or a more complex linear function) that most closely fits the data according to a specific mathematical criterion (e.g., MSE). The linear models do not overfit to the data because of their simple structure. A seven layers linear Deep Neural Network (MLP) was designed for regression of TVB-N content of fish fillets with the structure was presented in Fig. 6. LDNN model was designed using Keras library and trained it using 140 samples, as the training data and test it using 70 samples as the test data. Model evaluation. The spectral data selected by the SPA algorithm were considered as input for training (calibration set) and testing (prediction set) the deep learning models. The assessment factors include the adjusted determination coefficient (R 2 C(adj) , R 2 CV(adj) , and R 2 P(adj) ), the root mean square error of them (RMSEC, RMSECV and RMSEP) and residual predictive deviation (RDP) 43 .
Statistical analysis. TVB-N analysis was conducted in 30 replicate and the data were reported as mean ± standard division (SD). Statistical analyses were carried out by one-way ANOVA. Tukey's test was applied to determine the significant differences between the means (P ≤ 0.05). Statistical calculations were performed in Minitab Version 17 statistical software (Minitab Inc. Pennsylvania, USA).

Meta-analysis.
Meta-analysis was done based on published studies reporting the evaluation of TVB-N content of meat products based on HSI system (400-1000 nm). The articles with experimental study design (original research), searching on meat and seafood, HSI system (400-1000 nm), TVB-N content (based on regression) and English language were considered as eligible article to review. On the other hands the letters, www.nature.com/scientificreports/ conference abstracts, patients and review articles as well as the researches classified the meat samples based on TVB-N content were excluded. Thus, eight articles (26 case studies: some articles had compared the two or more algorithms) were selected. A comprehensive literature search was carried out between 20 December 2019 to 20 January 2020 on google scholar databases. The following search terms were applied as keywords and Boolean operator: "hyperspectral imaging" + meat, TVB-N + "hyperspectral imaging". The search was limited to English language articles. The meta-analysis was carried out based on descriptive analysis (one-way ANalysis Of VAriance (ANOVA)) and comparative analysis (frequency data). Statistical analysis was performed using the Minitab 17.1.0 (Minitab Inc. PA, USA) 46 . The further details were provided in supplementary materials ( Supplementary  Fig S1 and Supplementary Table S1).

Statement.
All experiments and methods were performed in accordance the approved guidelines of the Shiraz University. All experimental protocols were approved by the Ethics Committee of Shiraz University of Medical Sciences and all experiments were conducted in accordance with the approved guidelines of Iran Veterinary Organization.
The methods were carried out in accordance with the approved guidelines of the University of Sydney Ethics Committee. All experimental protocols were approved by the University of Sydney Ethics Committee.
Ethical approval. This article does not contain any studies with human participants performed by any of the authors.

Conclusions
In several previous works, the potential of various deep learning algorithms to classify of seafood and meat products in two freshness grades (fresh and stale) was investigated. The results demonstrated that the hyperspectral imaging system coupled with the deep learning method had great potential for classifying of freshness quality of meat products with a total classification accuracy of more than 90%. However, sometimes the evaluation of freshness index based on numerical output and in a regression, framework is helpful for better decision making and management. As a result, the present study is the first time that a deep learning algorithm was applied to predict a freshness indicator in regression framework. In the calibration set, the PLSR and LS-SVM models showed better performance than the LDNN algorithm. However, in the prediction set, the findings of this study demonstrate that the combination of linear deep learning neural network and the hyperspectral system gave reasonable accuracy for the prediction of TVB-N content in fish fillets (R 2 P = 0.853; RMSEP = 3.159 and RDP = 3.001). Based on the meta-analysis, the results of established prediction system were comparable with the hyperspectral imaging system based on the traditional chemometric analysis. In order to make the DLNN model with higher accuracy and ability, a large amount of data to train the system is necessary. it is the most important challenge for evaluating food product quality based on a deep-learning algorithm and experimental data. Therefore, there is still much work to be done and the results obtained by the SPA-LDNN method would encourage more research efforts on using deep learning as a novel chemometric method for evaluating the freshness quality of food products. Establishing a comprehensive database for a certain fish freshness index, the extraction of modeling features (e.g., optimal wavebands) based on a deep-learning method and the use of other deep-learning algorithms and compare their performance, are some suggested solutions to enhance the accuracy of a hyperspectral system coupled with deep learning algorithms.

Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.