Introduction

Taiwan is suitable for the growth of various fruit trees because Taiwan is located in the subtropical region and has diverse terrain. Fruits are produced in Taiwan all year round, and more than 30 kinds of fruits are produced in Taiwan. In 2021, The production value of the agricultural products in Taiwan was US$ 9 billion1. The production value of fruits was US$ 3.35 billion, ranked first among agricultural products, and accounted for 37.2% of the total agricultural products. In 2020, the top four fruit export volumes in Taiwan were pineapple, sugar apple, mango, and wax apple2. The wax apple, with the scientific name of Syzygium samarangens, has the highest export unit price of approximately US$ 3.8 to US$ 5.8. The popular species of wax apple in Taiwan include Pink3, Palm, Ruby, Tainung No.1 Amethyst, Tainung No.2 Big Shape, and Tainung No.3 Sugar Barbie with the highest unit price4.

The price of fruits depends on their quality which is related to fruit shape, size, appearance, water content, pulp texture, soluble solids, acidity, sugar content, post-harvest fresh-keeping packaging, etc. Among them, the sugar content is an essential indicator for tasting fruits and is normally measured using a refractometer. The refractometer measuring the sugar content of fruit juices is a destructive way. Thus, it can only measure the sugar content of a sample of the fruit. However, different fruits may contain different sugar content. These differences come from different trees or positions on the tree, climate conditions, and different cultivation methods. Measuring the sugar content of each fruit non-invasive and destructively would bring at least two benefits. Cultivation methods could be optimized by observing the sugar content distribution of the fruits in each tree or a single tree on the farm without destroying these fruits. The quality of the fruits can be further graded by non-destructively measuring the sugar content to increase the economic value of fruits. Thus, the better-quality fruits can have a higher price, while the lower-quality fruits can be processed as canned or preserved food.

Hyperspectral imaging (HSI) systems collecting both reflectance spectrum and image data of a sample can be developed as a non-destructive measurement5,6,7. The deterministic methods for hyperspectral analysis include the Multiple Linear Regression (MLR), Principal Component Regression (PCR), and Partial Least Square Regression (PLSR). Peris et al.8 applied the MLR with near-infrared spectral data to measure the solids content of peaches. Mendoza et al.9 used the PLSR on hyperspectral scattering data to detect apple fruit firmness. Liang et al.10 detected the zebra chip disease in potatoes using the PLSR with near-infrared (NIR) spectroscopy. Kemps et al.11 assessed the concentration of anthocyanins, polyphenols, and the sugar content in grapes by PLSR. Chuang et al.12 proposed the Independent Component Analysis (ICA) and PLSR with NIR spectral data to quantify the sugar content of wax apples. Viegas et al.13 suggested that the total anthocyanin content (TAC) and the total phenolic compounds (TPC) could be determined by the PLR with NIR spectral data. In addition to the deterministic methods, deep learning methods have been developed to predict the quality of fruits. The convolution neural network (CNN) and the support vector machine (SVM) were proposed to detect the ripeness of strawberries14 and the maturity of citrus15. Furthermore, Tu et al.16 used the region-based CNN-based model to identify and detect the number of passion fruit in orchards. Fajardo and Whelan17 detected the fruit in orchards using CNN-based models. Marani et al.18 obtained high accuracy of grape bunch segmentation with a deep neural network.

HSI and multispectral imaging (MSI) systems act as the important instruments to conduct non-destructively analysis and detection. HSI systems measure hundreds to thousands of spectral bands with submicron-level resolution but have the highest measurement duration in all spectral systems. Contrarily, multispectral imaging (MSI) systems detect fewer bands but with a shorter measurement duration and smaller volume compared to those of HSI systems. Thus, the HSI systems are suitable to collect high-resolution hyperspectral data for spectral analysis. A dozen or several signature bands can be extracted from the hyperspectral data and can be applied in a portable MSI system for rapid and massive detection.

Several studies developed the spectral band selection or signature-band extraction based on the deep learning models, as listed in Table 1. To improve the computation duration, Zhan et al.19 eliminated the redundant bands by the distance density. To find the importance of the corresponding bands in a model, Darling et al.20 applied the Frobenius norm to obtain the value of each row vector delivering a contribution vector by getting the trained weight matrix. Mou et al.21 proposed an unsupervised deep reinforcement technique for hyperspectral band selection. Elkholy et al.22 proposed a deep-encoder-based unsupervised hyperspectral band selection method to perform classification. Cai et al.23 reduced redundant bands with contribution map-based CNN.

Table 1 Studies on signature-band extraction or selection with deep learning methods.

The deep learning models with hyperspectral data were successfully used to predict the sugar content of the Syzygium samarangense with the MAE of 0.5°Brix in our previous study24. The results indicated that the deep learning models with hyperspectral data were beneficial in conducting non-destructive prediction for the sugar content of fruits. If the sugar content can be predicted by several spectral bands, the detection time can be greatly shortened. How to extract the signature bands from hyperspectral data to conduct a rapid and accurate sugar content prediction is a critical issue that is focused in this study. To deal with this issue, this paper presents a new method to extract six signature bands from the deep learning models trained with spectral data. The six signature bands were further utilized to train new deep-learning models to perform the sugar content prediction. The models using six signature bands had the mean absolute errors (MAEs) lower than 0.5°Brix when predicting the sugar content. Thus, the resulting performance of those models using six signature bands could be as good as that of the commercial Brix meters. Furthermore, the performance of the models using the signature bands with different bandwidths in different spectral ranges was evaluated. Firstly, the bandwidth of signature bands can be optimized for sugar content prediction because the bandwidth of spectral bands measured by an MSI system can be adjusted. Secondly, the spectral range of an MSI system is associated with the sensor equipped on this system. For example, the spectral response of a silicon-based complementary metal–oxide–semiconductor sensor with and without an infrared filter covers the visible (VIS) spectrum ranging from 400 to 700 nm and the visible to near-infrared (VISNIR) spectrum ranging from 400 to 1000 nm, respectively. The spectral response of an indium gallium arsenide sensor covers the short-wave infrared (SWIR) ranging from 900 to 2500 nm. Thus, which spectral range and bandwidth of signature bands can be employed to effectively identify the sugar content is a novel exploration in this paper.

The sugar content prediction of Syzygium samarangense was taken as an example in this paper. “Materials and data pre-processing” introduces how we collected the hyperspectral data of the peel surface of wax apples and performed the pre-processing of the hyperspectral data. “Proposed methods” addresses the method and procedure of signature-band extraction from hyperspectral data. This section includes three subsections. Firstly, CNN and FNN models were trained and tested with the spectral data of five bandwidths in the four wavelengths, respectively. Secondly, the signature bands were assessed from these well-trained models. Thirdly, new CNN and FNN models were trained with the signature-band data and evaluated on the performance of sugar content prediction, respectively. Thus, “Experimental results” presents the experimental details and results of three subsections in “Proposed methods”. “Conclusions and discussion” provides a comprehensive discussion of the results and highlights the important conclusions and future works drawn from this study.

Materials and data pre-processing

Sample preparation

This study takes fruits of wax apple, whose scientific name is Syzygium samarangense, as an example of hyperspectral signature-band extraction for sugar content prediction. The species of these apples were Tainung No.3 Sugar Barbie4. All data collection procedures were conducted at the Fengshan Tropical Horticultural Experiment Branch (FTHEB), Kaohsiung, Taiwan. 136 wax apples were purchased from Meishan, Fengshan, Liouguei, and Jiadong and refrigerated in a laboratory at FTHEB. The water content of each wax apple needs to keep consistent because the water content affects the spectral reflectance of the wax apples as well as the temperature of wax apples. Thus, refrigerated wax apples were placed in a 25 °C environment and waited for their temperature to return to 25 °C. Afterward, the wax apples were chopped into 16 slices with two vertical and three horizontal cuts, as shown in Fig. 1.

Figure 1
figure 1

(a) An intact wax apple marked with cut lines and (b) slices of three wax apples fixed on a flat plate for hyperspectral measurements.

Hyperspectral data collection

The hyperspectral data collection was conducted in a dark room and performed by the coaxial heterogeneous HSI system25. This system can concurrently acquire the VIS and SWIR spectral data of the wavelength ranging from 400 to 1700 nm. The raw hyperspectral image is three-dimensional data (W, L, Λ), where W is image width, L is image length, Λ is the number of spectral bands. The spectral data were divided into four wavelength ranges for spectral analysis, including 400–1700 nm, 400–700 nm, 400–1000 nm, and 900–1700 nm where the band numbers of the hyperspectral data in these four spectral ranges are 1367, 575, 1053, and 424, respectively.

Destructive sugar content measurement

After the hyperspectral data collection, the wax apple slices were squeezed to extract their juice. The sugar contents of these juice were measured by a commercial refractometer ATAGO PAL-1. Organisation Internationale de métrologie légale26 recommended that the refractometers can be used to determine the sugar content of fruit juices. The refractometer provides the index of °Brix, which correlates to total soluble solids (TSS) concentration. The TSS of the juice is mainly composed of glucose, fructose, and sucrose. Thus, TSS concentration in juice can represent its sugar content.

Spectral calibration and denoising

To eliminate the effect of the dark offset and system response, the raw hyperspectral data, Is(x, y, λ), minus dark, ID(x, y, λ), and divided by the spectrum of a standard calibration whiteboard, I(x, y, λ). Thus, the reflectance R(x, y, λ) could be derived from Eq. (1), written as

$$R\left( {x,y,\lambda_{i} } \right) = \frac{{I_{S} \left( {x,y,\lambda_{i} } \right) - I_{D} (x,y,\lambda_{i} )}}{{I_{w} \left( {x,y,\lambda_{i} } \right) - I_{D} (x,y,\lambda_{i} )}}.$$
(1)

The noise of the reflectance data was reduced by adopting a Savitzky–Golay filter, written as

$$R_{SGF} (x,y,\lambda_{i} ) = \sum\limits_{k = (1 - w)/2}^{(w - 1)/2} {C_{k} R(x,y,\lambda_{i + k} )} .$$
(2)

Data augmentation

The bottom part of wax apples has the highest sugar content. Thus, the bottom slices were used for sugar content modeling and prediction. A total of 1034 slices were recruited for spectral analysis. The image width and length of the slices are approximately 30–100 pixels and 70–160 pixels, respectively. An area of 20 by 20 pixels in each slice was used. Thus, the hyperspectral cubic data of each slice had the size of 20 × 20 × 1367. These three-dimensional data were the input of the CNN model. For the FNN model, 20 by 20 values in each band were averaged. Thus, the three-dimensional data were transferred to the one-dimension array with the size of 1 × 1367.

Each slice is seen as an individual sample which is measured with a °Brix value. The °Brix of the samples mostly ranged between 8 to 14. The samples were divided into five groups to observe the regression results of sugar content according to their °Brix values. The taste of fruits below the °Brix of 10 is not actually sweet. Thus, the first group contained the samples whose °Brix value was lower than 10. The second, third, and fourth groups contained the samples whose °Brix value was between 10 to 11, 11 to 12, and 12 to 13, respectively. The fifth group contained the samples whose °Brix value was larger than 13. Basically, 218 samples in most of °Brix levels were randomly selected for sugar content regression and divided into training, validation, and test sets, as shown in Table 2. The sample numbers of training, validation, and test sets were 131, 44, and 43, respectively.

Table 2 Some statistics of experimental materials, including the mean and standard deviation in each °Brix interval of samples and the sample numbers of training, validation, and test sets in each data type and spectral range. STD denotes the standard deviation of °Brix values.

Proposed methods

This study aims to extract the signature bands from hyperspectral data for rapid sugar content prediction. The bandwidth of the hyperspectral data measured by the coaxial heterogeneous HSI system is nanometer to sub-nanometer. In contrast, the multispectral data measured by an MSI system is approximately greater than 5 nm. As shown in Fig. 2, block A introduces that the hyperspectral data are converted to spectral data with multiple bandwidths to simulate multispectral data. The spectral data are used to create the CNN and FNN-based sugar content prediction models. The models are trained, verified, and tested. Block B illustrates that these models are used to extract the signature bands by the absolute mean of the integrated gradients method. Block C depicts that the CNN-based and FNN-based models are trained, verified, and tested with signature-band data for sugar content prediction.

Figure 2
figure 2

Procedure of the proposed method for signature-band extraction.

Modeling using multispectral data

Hyperspectral data conversion

In the coaxial heterogeneous HSI system, the VIS spectrometer has a spectral resolution of ~ 0.5 nm in the wavelength ranging from 400 to 1000 nm; the SWIR spectrometer has a spectral resolution of ~ 2.5 nm in the wavelength ranging from 900 to 1700 nm. The hyperspectral data were converted to spectral data, RM(x,y,λc), which is written as

$${\text{R}}_{M} \left( {x,\;y,\;\lambda_{c} } \right) = \frac{1}{{\lambda_{b} - \lambda_{a} }}\left[ {\sum\limits_{i = a}^{b} {{\text{R}}_{SGF} \left( {x,y,\lambda_{i} } \right) \times \left( {\lambda_{i + 1} - \lambda_{i} } \right)} } \right]{, }\,[\lambda_{a} ,\lambda_{b} ] \in \left[ {\lambda_{c} - \frac{w}{2},\lambda_{c} + \frac{w}{2}} \right]{ ,}$$
(3)

where RM(x,y,λc) is the mean of the integrated intensity of the central band, λc; the central band λc ranges from 400 to 1700 nm with an interval of bandwidth, w.

A total of six sets of spectral data were converted from hyperspectral data according to the six bandwidths, \(w\), of ± 2.5 nm, ± 5 nm, ± 7.5 nm, ± 10 nm, ± 12.5 nm, and ± 15 nm. Each set was grouped into four subsets based on the spectral range of 400–1700 nm, 400–1000 nm, 400–700 nm, and 900–1700 nm, as shown in Table 3.

Table 3 Band numbers of hyperspectral data and multispectral data with six bandwidths in four spectral ranges.

We apply the exponential learning rate decay method in Eq. (4) with the initial learning rate lr of the CNN and FNN models. The exponential learning rate decay method is written as

$$lr = lr \times \exp^{( - k \times j)} ,$$
(4)

where lr is the learning rate, and \(k\) is the decaying rate; j is the iteration number of the epoch. The performance of the CNN and FNN models for sugar content prediction was evaluated by the mean absolute error, MAE, which is defined as

$${\text{MAE}} = \frac{{\sum\nolimits_{i = 1}^{n} {\left| {y_{i} - \hat{y}_{i} } \right|} }}{n},$$
(5)

where yi is the prediction value of the sugar content, \(\widehat{y}\) is the true value of the sugar content, and n is the sample size. The goodness-of-fit of the CNN and FNN models was assessed by the R-squared value which is defined by

$${\text{R}}^{2} = 1 - \frac{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} - \overline{y}} \right)^{2} } }},$$
(6)

where \(\overline{y }\) is the mean of the prediction values. An R2 value of zero indicates that the regression of a model doesn’t fit the data properly, while an R2 value of 1 indicates that the regression of the model fits perfectly.

CNN modeling and verification using multispectral data

The architecture of the CNN model for modeling multispectral data is shown in Fig. 3. The input of this model is three-dimensional data (W, L, Λ). The number of bands, Λ, was tens to hundreds. The proposed CNN model is composed of 2D convolution layers (Conv2D) with L2-norm regularization, Batch Normalization layers (BN), Dropout layers (DP), and Fully-Connected layer (FC). All the neurons were activated by the Rectified Linear Unit (ReLU), and the loss is measured by the Root Mean Squared Logarithmic Error (RMSLE).

Figure 3
figure 3

The proposed convolution neural network model with the input of three-dimensional spectral data for sugar content prediction.

FNN modeling and verification using multispectral data

The architecture of the FNN model for modeling multispectral data is shown in Fig. 4. The input of the model is one-dimensional data with a size for tens to hundreds of bands. The CNN model consists of 4 FC-ReLU-BN-DP layers. The final output Y is the sugar content prediction result of the model. The FNN model is similar to the proposed CNN model. The weights with the smallest RMSLE are recorded for the test datasets.

Figure 4
figure 4

The proposed feedforward neural network model with the input of one-dimensional spectral data for sugar content prediction.

Signature-band extraction from multispectral data

According to the integrated gradient defined in Ref.27, the mean of the integrated gradients (MIG), derived from \(S\) samples, is written as

$${\text{MIG}} = \frac{1}{S} \times \frac{1}{M} \times \sum\limits_{i = 0}^{S} {\sum\limits_{k = 1}^{M} {\frac{{\partial F\left[ {R_{i}^{\prime } + \frac{k}{M} \times (R_{i} - R_{i}^{\prime } )} \right]}}{{\partial r_{i} }}} } ,$$
(7)

where R is the input data, and M is the number of intervals along the path from the baseline from R to R. Most deep learning frameworks could efficiently perform the calculation of the gradient.

$${\text{Score}} (\lambda_{C} ) = \left| {\left. {\frac{1}{W \times L}\sum\limits_{W} {\sum\limits_{L} {{\text{MIG}} \left( {W,L,\lambda_{C} } \right)} } } \right|} \right..$$
(8)

The MIG of the CNN models is a three-dimensional matrix. The three-dimensional MIG is averaged along the width, W, and length, L. Afterward, a one-dimensional array, Score, with the data points corresponding to the bands, λc, is obtained, as defined in Eq. (8). The MIG of the FNN models is a one-dimensional matrix. The Score of the FNN is the absolute of the MIG. The Score represents the contribution score of bands in a deep learning model. The significance of a band in a model is positively correlated to the Score of this band.

The band with the first highest Score is selected for the first signature band. The second signature band is selected according to the band which has the second highest Score and is out of the ± 20 nm range of the first selected band, and so on.

Modeling using signature bands

The hyperspectral data of the signature bands are resampled into the spectral data with the bandwidths of ± 25 nm, ± 20 nm, ± 15 nm, ± 10 nm, and ± 5 nm. The spectral data are used to train new CNN and FNN models for sugar content prediction. The input data of the CNN and FNN models were three-dimensional hyperspectral and one-dimensional spectral data of the signature bands, respectively. The band number of a signature-band set is six.

The architecture of the CNN model for modeling resampled spectral data is shown in Fig. 5. The input of this model is three-dimensional data (W, L, Λ). The number of bands, Λ, was six. The CNN model consists of two Conv-ReLu-BN layers with 64 kernel filters, two Conv-ReLU-BN layers with 64 kernel filters, one Average Pooling layer, two FC-ReLU-BN-DP layers, and the output layers with the rectified linear unit (ReLU) as the activation function. Because the number of bands, Λ, diminished to 6, some layers were reduced and the Average Pooling instead of Max-Pooling for the smaller size resampled data was used.

Figure 5
figure 5

The proposed convolution neural network model with the input of three-dimensional multispectral data for sugar content prediction.

The architecture of the FNN model for modeling multispectral data is shown in Fig. 6. The input of the model is one-dimensional data with a size of six bands. The FNN model consists of 4 FC-ReLU-BN layers. The final output Y is the sugar content prediction result of the model. The FNN model is similar to the proposed CNN model; the weights with the smallest RMSLE are recorded for the test datasets.

Figure 6
figure 6

The proposed feedforward neural network model with the input of one-dimensional signature-band data for sugar content prediction.

Experimental results

Results of the modeling using multispectral data

Modeling with the spectral data in different wavelength ranges might have various results; thus, the spectral range was divided into four spectral ranges, including VIS (400–700 nm), VISNIR (400–1000 nm), SWIR (900–1700 nm), and VISWIR (400–1700 nm). The hyperparameters of the CNN and FNN models are shown in Tables 4 and 5. The CNN and FNN models using spectral data of six bandwidths assumed the same ten hyperparameters and nine hyperparameters, respectively. The exponential learning rate decay method is applied with the initial learning rate, and the decaying rate k is 1 × 10–6. Adam optimizer was also applied. In the CNN model, the regularization parameter is 1 × 10–10 for L2 regularization.

Table 4 Selected hyperparameters of the convolution neural network model with multispectral data.
Table 5 Selected hyperparameters of the feedforward neural network model with multispectral data.

The prediction results of the CNN and FNN models using multispectral data with six bandwidths in four spectral ranges are shown in Table 6. The CNN and FNN models using the spectral data of ± 2.5 nm bandwidth had the worst prediction performance compared to those using the spectral data with the other bandwidths. Furthermore, the CNN and FNN models using the spectra in the SWIR spectral range had the worst prediction performance compared to those using the spectra in the other spectral ranges. Thus, the spectral data of ± 2.5 nm bandwidth and the spectra in the SWIR spectral range were excluded from the signature-band extraction.

Table 6 Sugar content prediction results of the convolution neural network and feedforward neural network models using the multispectral data of six bandwidths in four spectral ranges. MAE denotes the mean absolute error of the predictions. R2 denotes the R-squared values of the models. Bold values denote the best results of the sugar content prediction in the CNN and FNN models.

Results of signature-band extraction

Thirty deep learning models were trained by the spectral data with five bandwidths in three spectral ranges. The five bandwidths included ± 5 nm, ± 7.5 nm, ± 10 nm, ± 12.5 nm, and ± 15 nm, and the three spectral ranges were 400–700 nm, 400–1000 nm, and 400–1700 nm. The contribution of the bands used by each model was assessed by the Score, which is the absolute of the MIG, according to Eqs. (7) and (8). The top six spectral bands with the highest Score values of each model are listed in Tables 7, 8, 9, 10 and 11. Each entry in these tables includes the spectral band and its MIG value. The absolute MIG values of the FNN model using spectral data with a bandwidth of ± 5 nm in a spectral range of 400–700 nm are shown in Fig. 7. The six signature bands in this FNN model were selected according to the criteria introduced in sub-section B of section III and marked in the bars with a slash texture.

Table 7 Top six Score ranked spectral bands associated with the convolution neural network and feedforward neural network models using the spectral data with a bandwidth of ± 5 nm in three spectral ranges. The Socre is the absolute mean of the integrated gradients.
Table 8 Top six Score ranked spectral bands associated with the convolution neural network and feedforward neural network models using the spectral data with a bandwidth of ± 7.5 nm in three spectral ranges. The Socre is the absolute mean of the integrated gradients.
Table 9 Top six Score ranked spectral bands associated with the convolution neural network and feedforward neural network models using the spectral data with a bandwidth of ± 10 nm in three spectral ranges. The Socre is the absolute mean of the integrated gradients.
Table 10 Top six Score ranked spectral bands associated with the convolution neural network and feedforward neural network models using the spectral data with a bandwidth of ± 12.5 nm in three spectral ranges. The Socre is the absolute mean of the integrated gradients.
Table 11 Top six Score ranked spectral bands associated with the convolution neural network and feedforward neural network models using the spectral data with a bandwidth of ± 15 nm in three spectral ranges. The Socre is the absolute mean of the integrated gradients.
Figure 7
figure 7

Absolute mean of the integrated gradients of the feedforward neural network model with a bandwidth of ± 5 nm in a spectral range of 400–700 nm. The top six Score ranked bands are shown by the bars with slash texture and selected as the signature bands, as listed in Table 7. The blue bars represent that the MIG value of the bands is positive. The red bars represent that the MIG value of the bands is negative.

Results of modeling using signature bands

Each model trained by the spectral data with a bandwidth in a spectral range had one set of the six signature bands with the six highest absolute MIGs, as listed in Tables 7, 8, 9, 10 and 11. Thus, a total of thirty sets of spectral bands associated with the bandwidths of ± 25 nm, ± 20 nm, ± 15 nm, ± 10 nm, and ± 5 nm in three spectral ranges of 400–1700 nm, 400–1000 nm, and 400–700 nm were selected as signature bands. The performance of the CNN and FNN models trained using these thirty sets of signature bands was further evaluated. The eight hyperparameters of the CNN and FNN models trained by these signature bands with the lowest MAE in each spectral range are shown in Tables 12 and 13, respectively. The exponential learning rate decay method is applied with the initial learning rate, and the decaying rate k is 1 × 10–6. Adam optimizer was also applied. In the CNN model, the regularization parameter is 1 × 10–10 for L2 regularization.

Table 12 Selected hyperparameters of the convolution neural network models using the signature bands with the lowest prediction error in each spectral range.
Table 13 Selected hyperparameters of the feedforward neural network models using the signature bands that had the lowest prediction error in each spectral range.

The sugar content prediction results of the CNN and FNN models using the signature bands are shown in Table 14. The FNN model using the VISWIR signature bands with a bandwidth of ± 12.5 nm had a minimum MAE of 0.390° Brix compared to the other FNN models. The CNN model using the VISWIR signature bands with a bandwidth of ± 10 nm had the lowest MAE of 0.549° Brix compared to the other CNN models. The MAEs of the FNN models were significantly lower than those of the CNN models.

Table 14 Sugar content prediction errors of the convolution neural network and feedforward neural network models using the signature bands with the bandwidths of ± 5 nm, ± 7.5 nm, ± 10 nm, ± 12.5 nm, and ± 15 nm in three spectral ranges of 400–700 nm, 400–1000 nm, and 400–1700 nm. MAE denotes the mean absolute error of the predictions. R2 denotes the R-squared values of the models. Bold values denote the best results of the sugar content prediction in the CNN and FNN models.

Conclusions and discussion

This study proposes a signature-band extraction method using the absolute mean of the integrated gradients score to extract signature bands from sugar-content deep-learning models with the input of spectral data. Firstly, the hyperspectral data with bandwidths lower than 2.5 nm were converted to spectral data with bandwidths of ± 2.5 nm, ± 5 nm, ± 7.5 nm, ± 10 nm, ± 12.5 nm, and ± 15 nm. The spectral data were used to train and verify CNN and FNN models. The CNN model using the VISWIR spectral data with a bandwidth of ± 12.5 nm had a minimum MAE of 0.400°Brix compared to that of the other CNN models. Furthermore, the FNN model using the VISNIR spectral data with a bandwidth of ± 15 nm had a minimum MAE of 0.408°Brix compared to the other FNN models.

The absolute MIG method was used to extract signature bands from the CNN and FNN models. The MIG of the deep learning model corresponding to an input of spectral bands can be positive or negative. The MIG implies that the input bands have either positively or negatively correlated to the output of the model. The six spectral bands with the highest positive absolute MIGs were considered to be signature bands. However, the FNN and CNN models trained by these signature bands didn’t have lower prediction errors compared to the models trained by the six signature bands with the highest absolute MIGs. Thus, this study finds the signature bands with the highest absolute MIGs instead of finding the signature bands with the highest positive MIGs. The central wavelength of the top six bands selected from the CNN models in the VISWIR range was not over 1000 nm (Tables 7, 8, 9, 10 and 11). Only one of each six bands over 1000 nm was chosen from the FNN models in the VISWIR range. The results imply that the signature bands in our deep learning models for predicting the sugar content of the wax apples are significantly located in the VIS range. This finding seems to show that the VIS spectrum of the appearance of the wax apples might have correlated considerably with the sugar content of the wax apples.

Thirty sets of six signature bands of spectral data in five bandwidths were used to train the CNN and FNN models for sugar content prediction. The input data of the CNN and FNN models were three-dimension and one-dimension, respectively. The minimum MAE of the CNN and FNN models using six signature bands were 0.549°Brix and 0.390°Brix. Both results were less than 0.55°Brix; and close to or better than that of the CNN and FNN models using hundreds of spectral bands or thousands of spectral bands. The performance of the FNN models was almost better than that of the CNN models in the three spectral ranges of 400–1700 nm, 400–1000 nm, and 400–700 nm. The correlation between two adjacent bands is reduced in spectral data compared to hyperspectral data; the CNN model tends to find the correlation between two spectral bands; this might be why the performance of the CNN model was worse than that of the FNN model. The results reveal that the FNN model with one-dimensional input data might perform better on the sugar content prediction than the CNN model with three-dimensional input data. Furthermore, the CNN and the FNN models using only six signature bands have a high potential to predict the sugar content of wax apples. These six signature bands could be used in an MSI system to non-destructively and rapidly predict the sugar content of the wax apples in the future.

The spectral data was not overlapped in the spectrum when it was converted from the hyperspectral data in this study. However, the spectrum of each spectral band could overlap. Spectral data with overlapped spectrums could be considered for signature-band extraction. Furthermore, the spectral bands used in an MSI system could have different bandwidths. Thus, the signature bands might be chosen from the bands with different bandwidths. These two variables greatly increase the complexity of the band extraction. In the future, an artificial intelligence model may perform signature-band extraction from spectral data with different bandwidths or overlapped spectrums.