Quantitative measurement of internal quality of carrots using hyperspectral imaging and multivariate analysis

Mulowayi, Arcel Mutombo; Shen, Zhen Hui; Nyimbo, Witness Joseph; Di, Zhi Feng; Fallah, Nyumah; Zheng, Shu He

doi:10.1038/s41598-024-59151-y

Download PDF

Article
Open access
Published: 12 April 2024

Quantitative measurement of internal quality of carrots using hyperspectral imaging and multivariate analysis

Arcel Mutombo Mulowayi^1,4,
Zhen Hui Shen^1,2,4,
Witness Joseph Nyimbo³,
Zhi Feng Di^1,4,
Nyumah Fallah³ &
…
Shu He Zheng^1,4

Scientific Reports volume 14, Article number: 8514 (2024) Cite this article

293 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

The study aimed to measure the carotenoid (Car) and pH contents of carrots using hyperspectral imaging. A total of 300 images were collected using a hyperspectral imaging system, covering 472 wavebands from 400 to 1000 nm. Regions of interest (ROIs) were defined to extract average spectra from the hyperspectral images (HIS). We developed two models: least squares support vector machine (LS-SVM) and partial least squares regression (PLSR) to establish a quantitative analysis between the pigment amounts and spectra. The spectra and pigment contents were predicted and correlated using these models. The selection of EWs for modeling was done using the Successive Projections Algorithm (SPA), regression coefficients (RC) from PLSR models, and LS-SVM. The results demonstrated that hyperspectral imaging could effectively evaluate the internal attributes of carrot cortex and xylem. Moreover, these models accurately predicted the Car and pH contents of the carrot parts. This study provides a valuable approach for variable selection and modeling in hyperspectral imaging studies of carrots.

Early warning and diagnostic visualization of Sclerotinia infected tomato based on hyperspectral imaging

Article Open access 07 December 2022

Comparative quantification of chlorophyll and polyphenol levels in grapevine leaves sampled from different geographical locations

Article Open access 10 April 2020

Nondestructive classification of soft rot disease in napa cabbage using hyperspectral imaging analysis

Article Open access 29 August 2022

Introduction

Carrot (Daucus carota L.) is a widely consumed root vegetable crop known for its high nutritional value, including essential micronutrients such as vitamins A and C^1,2. Carrot production is rising worldwide, with China leading the way as the top producer³. Although carrots are typically orange, they also exhibit a range of other colors including purple, red, and yellow, thereby enriching the diversity within the spectrum⁴. Moreover, these crops provide significant amounts of antioxidants, provitamin A, and carotenoids, which have been linked to various health benefits, including a lower risk of prostate cancer and improved heart and liver health^5,6,7,8.

With its unique pH value, carrot juice is susceptible to spoilage and pathogenic organisms⁹. Key quality indicators for carrots include factors like color, absence of bruises, provitamin A content, vitamin C levels, and firmness, all of which impact shelf life, market value, and consumer satisfaction¹⁰. Carrots' shelf life, selling price, and customer satisfaction depend on their quality. Enhancing carrot quality inspection and developing rapid quality control technologies that give precise and detailed information about nutritional content is crucial, given rising consumption and the effects of climate change^11,12. This information can be utilized to ascertain the most suitable time for harvesting, refine storage parameters, and enhance the nutritional quality of processed carrot derivatives.

The simultaneous collection of spectral and image data from the tested sample using hyperspectral imaging (HSI) merges conventional spectroscopy and digital imaging technology into a system^13,14,15. HSI technology is used in various industries, including agriculture, food¹⁶, environmental management, and urban planning. It can provide substantial information in spectral and spatial domains¹⁷. In recent years, HSI technology has played a pivotal role in detecting the internal quality of agricultural products, ranging from moisture and starch detection contents¹⁸ to protein and fat analysis. Furthermore, HSI has also been leveraged to investigate crop diseases¹⁹, nutrient deficiency²⁰, and estimating biochemical and biophysical characteristics essential for understanding vegetable physiological status and predicting crop yields. Moreover, this tool can investigate soil properties, including moisture content, organic matter, and carbon content^21,22, total capsaicinoids²³, and pH²⁴. Munera et al.^25,26 mentioned that the evaluation of fruit quality is a recently developed application. For instance, studies on the quality detection of bakery goods, meat, and fresh vegetables have already been published²⁷.

Research has shown that visible/near-infrared HSI technology has been extensively employed in the non-destructive assessment of interior fruit attributes, including soluble solids content (SSC) and firmness. Nevertheless, the current research on predicting Car and pH content in various regions of fruits, such as the cortex and xylem, is limited from a scientific standpoint. To comprehensively evaluate the internal quality attributes of carrots, this study aimed to investigate the potential of hyperspectral reflectance imaging for predicting the Car and pH content of carrots. We sought to investigate these parameters in two central regions of carrots (cortex and xylem) using visible and near-infrared (Vis/NIR) HSI. The specific objectives of this study were to:

1.
Acquire hyperspectral images of carrot samples and extract spectral data from them.
2.
Build partial least squares regression (PLSR) and least squares support vector machine (LS-SVM) models using the entire spectrum.
3.
Choose representative wavelengths using successful projection algorithms (SPA) and regression coefficients (RC) from PLSR.
4.
Develop simplified LS-SVM and PLSR models.
5.
Use the best model to predict the quality attributes of each sample pixel and compare its performance to Fig. 1.

Methods

Sample preparation

A stratified sampling approach was applied to select carrot samples for analysis. A comprehensive collection of 300 carrot samples, exhibiting comparable shape and size, was procured from Putian (Pt) and Fuzhou (Fz) City, located within the geographical boundaries of Fujian Province, China. Each sample weighted between 55 to 65 g.

Following an exhaustive washing process, carrots that exhibited cracks, rust, dysmorphia, or dark discoloration were excluded from the sample set. As a result, 240 samples remained, all meeting the predefined quality criteria. Among the samples selected for investigation, 120 carrots were sourced from the Fz, while the remaining 120 originated from the Pt. The carrots were stored in a sealed plastic bag at 3 °C for 2 days. Later, each carrot was divided into two halves to investigate Car and pH contents. The plant material used in this work complies with relevant institutional, national, and international guidelines and legislation.

HSI system and image acquisition

Experimental HSI was conducted using a high-performance CCD digital camera (Sencicam QE Taiwan) and a hyperspectral camera (HIS-V10E-sCMOS) that covered the wavelength range of 400–1000 nm with a spectral resolution of 2–8 nm. The system was equipped with Oriel Instruments USA halogen tungsten light bulbs, a spatial resolution point radius of 9 m, a light source supply system with a feedback controller, and a computer. The camera was operated using Camera Control Kit V219. We used this camera to capture hyperspectral images. The camera consisted of a focal length of 170mm, a scanning line distance of 2mm, and a light source beam's optical center located 2mm from the scanning line. We integrated data from four evenly spaced places over the equator using a 22-binning technique to provide a full spectral image to conduct the HSI of the carrots. See Fig. 1 for a graphic depiction of the HSI equipment.

Image processing

One of the most important steps in pre-processing the hyperspectral images was calibrating the raw data to exclude dark current effects from the CCD camera. After calibration, an area of interest (ROI) was found in the calibrated images, and spectrum data was then taken out of these ROIs, as Fig. 2 shows. To reduce differences caused by illumination, detector sensitivity, camera specs, and subtleties in the physical setup, raw hyperspectral photos were corrected by comparing them to black-and-white reference images²⁸.

The camera lens was covered with its opaque cap, and the light source was turned off to provide a black reference image. Alternatively, a spectral image of a uniformly white tile with approximately 99.9% reflectance was captured to create a white reference image²⁹. The following equation was used to adjust the uncorrected hyperspectral pictures:

$$R=\frac{I-{I}_{d}}{{I}_{w}-{I}_{d}}$$

(1)

Here, R represents the corrected hyperspectral image, while $I$ represents the sample's initial spectral image, ${I}_{d}$ denotes the dark reference image and ${I}_{w}$ standsfor the white reference image. We leveraged image acquisition software to correct the image.

Spectral pre-treatment

The hyperspectral data were extracted from the acquired ROIs for spectral processing. Undesired variations were compensated (negative effects from random and systematic noise), and unnecessary or noisy wavelengths were removed to improve prediction accuracy. Pre-treatment was applied to the spectral data in the form of operations, including smoothing, derivatives, multiplicative scatter correction (MSC)^14,15, standard normal variate (SNV), and Savitzky–Golay (SG)³⁰. The SG smoothing method with a window width of three points was used to reduce high-frequency noise, baseline excursion, and dispersion to stabilize the baseline and reduce noise. Moreover, MSC was applied to adjust for additive and multiplicative scatter effects, which improved and corrected the obtained hyperspectral data.

The SG filter³¹ is a widely used technique for smoothing data based on approximating the raw data using polynomials in a defined data frame. The SG filter has two degrees of freedom, including polynomial order and window length. The first parameter enables the smoothed data to follow the raw data as closely as possible. This process demonstrates the importance of preserving the edges of the data. However, it also entails the drawback of tracking noise fluctuations. The window length neutralizes the high-frequency noise contribution for the second degree of freedom by smoothing its fluctuations through polynomial fitting^32,33. The SG filter searches for the optimal n + 1 polynomial coefficients for a given n-degree polynomial to best suit the raw data and assesses the outcome in the window center^34,35. The polynomial function was applied to the signal point by point. The measured value of the window's midpoint was replaced with the polynomial function's estimated value. The degree of smoothing was altered by changing the window's width and polynomial order. In addition to SG smoothing, other spectral pre-treatment methods, such as MSC and SNV, are commonly used to compensate for undesired variations and remove unnecessary or noisy wavelengths^14,15,30.

Effective wavelength selection methods

The spectrum data set may comprise thousands of variables/wavelengths and hundreds or thousands of samples^36,37 due to the high resolution of modern spectroscopic instruments. Such large-scale data can make hyperspectral image inspection techniques more time-consuming. Moreover, variable selection (wavelength selection) is crucial in identifying the relevant variables and eliminating highly correlated ones to reduce computational complexity, increase detection effectiveness, and meet the industry-required inspection speed^38,39. While no definitive method has been established for selecting optimal wavelengths, various approaches have been recommended⁴⁰. For instance, SPA, RC, uninformative variable elimination (UVE), simulated annealing (SA), K-nearest neighbors regression (K-NNR), and genetic algorithm (GA) are a few multivariate algorithms that have been suggested for developing quantitative models.

In this study, the wavelength selection techniques utilized included RC, K-NNR, and SPA. SPA identified wavelengths with the least redundant information. SPA has been described as a method for identifying relevant features in a forward direction by comparing projection vectors resulting from projecting wavelengths onto other wavelengths. It chooses the most significant projection vector wavelength and incorporates it into the candidate subset of characteristic wavelengths. Studies have described SPA as a method that identifies relevant features in a forward direction by comparing projection vectors resulting from projecting wavelengths onto other wavelengths. It selects the most significant projection vector wavelength and includes it in the candidate subset of characteristic wavelengths⁴¹. Here, the performance of different subsets was evaluated using a regression model. SPA aims to identify a combination of variables that contains the least redundant information and the least covariance, thereby reducing model complexity and improving accuracy. Overall, SPA is a useful tool for feature selection in various applications, such as regression, classification, and data mining.

It has been established that RC plays a decisive role in creating a predictive model for specific data collection. Weighted RC, also known as b-coefficients, which are equivalent to the model with full spectra, are used to calculate RC. The best wavelengths are determined by selecting those with the highest absolute b-coefficient values. This approach enables the identification of the most crucial wavelengths for forecasting the response variable, leading to a more accurate and effective model⁴². The use of fewer wavelengths in spectral analysis has the potential to improve model performance^24,24. The method representing a small number of wavelengths, RC, and SPA, was chosen for modeling following the selection of EWs.

Modeling methods and model evaluation

Model validation is an important step in multivariate data analysis. The prediction model for this study was constructed utilizing PLSR and LS-SVM, which are linear multivariate algorithms. This is because of its efficacy when a linear relationship exists between spectra and object properties^43,44,45. PLSR is widely employed in chemometrics to analyze the correlation between spectral data and reference quality indicators. A set of statistically uncorrelated latent variables was utilized by the PLSR model to forecast Car and pH levels. Through decomposition, this method generates principal factors from the independent and dependent variables as they are projected into a new multidimensional space. Seven PLSR factors were chosen for this investigation according to the correlation strength of the principal factors. Notably, PLSR and LS-SVM were applied to the prediction model. However, PLSR was solely utilized to model the full spectra⁴⁴.

Based on concepts from statistical learning theory, SVM can be used for classification and nonlinear regression. LS-SVM is an enhancement of traditional SVM. It uses least-squares linear systems as the loss function rather than traditional convex quadratic programming⁴⁶. LS-SVM is more than SVM because of its low computational complexity and efficiency.

$$y\left(x\right)={\sum }_{k=1}^{N}{a}_{k}K\left(x,{x}_{k}\right)+b$$

(2)

In the given context, $K\left(x,{x}_{k}\right)$ represents the kernel function, ${x}_{k}$ indicates the input vectors, k denotes the support values, and $b$ indicates the bias factor. The computation of similarity between the input vectors is the responsibility of the kernel function, and the kernel function selection influences the efficacy of the model.

Furthermore, a correlation analysis was conducted to assess the RC of the simplified models, investigating the association between the EWs and the quality features.

Model evaluation

The prediction capacities of the models were assessed by calculating statistical metrics, including the coefficient of determination of calibration (R²_cal), coefficient of determination of prediction (R²_pre), root mean square error (RMSEC, RMSEP), and RPD can be described as follows:

$${R}^{2}=\frac{{{\sum }_{i}(\widehat{y}}_{i}-{y}_{i}{)}^{2}}{{\sum }_{i}(\overline{y }-{y}_{i}{)}^{2}}$$

(3)

$${\text{RMSE}}=\sqrt{\frac{1}{{\text{m}}}{\sum }_{{\text{i}}=1}^{{\text{m}}}(\widehat{y}-{y}_{i}{)}^{2}}$$

(4)

$${\text{RPD}}=\frac{{\text{SD}}}{{\text{RMSE}}}$$

(5)

where m represents the number of samples, ${\widehat{{\text{y}}}}_{{\text{i}}}$ represents the predicted value, ${{\text{y}}}_{{\text{i}}}$ represents the actual value and $\overline{{\text{y}} }$ represents the mean value of the actual value. SD is the standard deviation of the validation sample.

When the RPD value is greater than 2.5, it indicates a high capacity for prediction⁴⁷. Spectral data extraction was conducted on ENVI 4.8 (ITT, Visual Information Solutions, Boulder, USA). All computations and multivariate data analyses were performed with chemometric software Unscrambler^® 9.7 (CAMO AS, Oslo, Norway) and MATLAB R 2009b (The Math Works, Natick, USA).

Biochemical analyses

After acquiring hyperspectral images, the samples were immediately sliced and weighed for subsequent chemical analysis. Each measurement was performed three times⁴⁸. We used 0.1 g of fresh-weight material immersed in a 20 ml solution containing 80% acetone and 100% ethanol (1:1 ratio) for 24 h in darkness to extract the pigments. The pH composite electrode was mixed in pure water and then shaken dry after being thoroughly washed. The pH meter was placed into the 4.00 pH calibration solution to calibrate it. Once the calibration was finished, the meter was rinsed with distilled water and dried. The pH meter was calibrated using a standard buffer solution with a pH of 7.00^49,50. The meter was cleaned with pure water, dried, and calibrated using a pH 9.18 solution. The three-point calibration has been accomplished at this stage. A sufficient amount of pulp was extracted from each sample, squeezed to obtain juice, and then the electrode was immersed in the juice to measure the pH value. Next, the electrode was immersed in the juice, and the pH value was measured. Each sample underwent three measurements following the described procedure. The average of the three readings was considered for the pH value.

We measured the Car levels with a 752UV/Vis spectrophotometer and determined based on fresh weight using standard techniques^51,52.

Results

Hyperspectral reflectance spectra

Figure 3a shows how to identify the carrot region of interest. Figure 3b and c show the 400–1000 nm xylem and cortex spectral of the Fz carrot cultivar, respectively. These spectra were taken from the hyperspectral image of calibration set samples. It is evident that the spectra from all sides follow the same pattern across the whole wavelength range, but there were some notable deviations. The spectral curves exhibited distinct absorption and reflection peaks, as can be seen in Fig. 4. The reflectivity of the Fz-xylem and Pt-xylem side is slightly higher than that of the Fz-cortex and Pt-cortex side within the visible light range of 420 to 680 nm, which was based on the spectral images obtained from the Fz-xylem and Pt-xylem, as well as the Fz-cortex and Pt-cortex side. However, the reflectivity of the Fz-xylem and Pt-xylem side significantly increases compared to the Fz-cortex and Pt-cortex side within the near-infrared range of 780 to 1000 nm.

A typical Car absorption band at 680nm corresponds to the first discernible absorption peak. Around 750 nm is the peak of the second absorption center, and a relatively wide absorption band is connected to the band C–H's fourth overtone. The second overtone of band O–H may be related to the tiny absorption band at 950 nm⁵³.

In addition to the typical absorption characteristics, the spectral intensities of different samples were different, indicating differences in chemical components, which was conducive to constructing the Car and pH quantitative analysis model.

PLSR models based on the full spectra

We leveraged PLSR to establish regression models with the xylem and cortex datasets. The regression results are shown in Tables 1 and 2. PLSR models, the samples were taken in the same order on the carrot xylem and cortex side. The calibration and prediction sets for both regions of the carrot were also the same.

Table 1 Results and parameters of the calibration and prediction sets of Car by partial least squares regression (PLSR) models.

Full size table

Table 2 Results and parameters of the calibration and prediction sets of pH by PLSRmodels.

Full size table

As indicated by the R²_pre, RMSEP, and RDP values, the results displayed in Table 1 illustrate the ability to predict carotenoid quality and pH in the xylem and bark regions of the Fz-Pt cultivar. The RMSEP values for Fz and Pt in the xylem region were 0.026 and 0.027, respectively, while the R²_pre values for predicting carotenoid quality ranged from 0.903 to 0.915 for Fz and from 0.885 to 0.876 for Pt. Furthermore, the region where Fz and Pt had RDP values of 2.19 and 2.21, respectively, demonstrated a greater ability to predict carotenoid quality. In contrast, the R²_pre values obtained to infer pH in the xylem and cortex regions were comparatively lower, ranging from 0.666 to 0.702. Furthermore, RMSEP values ranged from 0.022 to 0.035. All RDP values were less than 2, indicating a satisfactory level of predictive accuracy despite the lower R²_pre values. It was also observed that the RDP values in the cortex region were slightly higher than those in the xylem region. This discrepancy implies that pH prediction performance was significantly improved in the cortex region.

Selection of effective wavelengths

Choosing a configuration with fewer wavebands is recommended to enhance the stability and integrability of a multispectral imaging system from a scientific standpoint (ElMasry et al. 2019⁵⁴). SPA was used to identify the EWs carrying crucial information for determining scaling rates and reducing data dimensionality. These EWs remove unnecessary information by including the whole spectral data range (400–1000 nm), representing the most important data among the EWs. Table 3 demonstrates that only the important wavelengths are required to estimate Car and pH. The pH decreased the number of wavelengths from 5 to 9, contrasting the pH range (8 to 14) detected in the xylem of both cultivars, which showed a wavelength range from (Table 3).

Table 3 The selected EWs for Car and pH using RC and SPA.

Full size table

Prediction of pH

Table 3 illustrated that the xylem spectra had less spectral than the cortex spectra. However, their predictive capability was much better.

Besides, Table 4 showed that the prediction under the RC-PLSR model had an R²_Pre of 0.672 and an RMSEP of 0.030, while the counterpart had an R²_Pre of 0.752 and an RMSEP of 0.029. However, the RC-LS-SVM model had an R²_Pre of 0.701 and an RMSEP of 0.032, while the counterpart had an R²_Pre of 0.802, an RMSEP of 0.026, an R²_Pre of 0.757, and an RMSEP of 0.024.

Table 4 LS-SVM and PLSR models calibration and prediction of pH using EWs (RC and SPA).

Full size table

The SPA-PLSR had an R²_Pre of 0.678 and an RMSEP of 0.031, while the SPA-LS-SVM had an R²_Pre of 0.731 and an RMSEP of 0.030, and the counterpart had an R²_Pre of 0.816 and an RMSEP of 0.028. The identical outcome observed in the Fz sample was found to be applicable in the Pt samples, as evidenced by the data presented in Table 4. Our findings indicate that the spectral characteristics of the xylem were more effective in predicting Car than those of the cortex.

Prediction of carotenoid

Here, we used EWs to build Car and pH-predicting models in two carrot cultivars, namely Fz and Pt. The obtained results of the car prediction are shown for the Fz sample in Table 5.

Table 5 LS-SVM and PLSR models calibration and prediction of pH using EWs (RC and SPA).

Full size table

Table 5 also showed that the Fz cultivar RC-PLSR model had an R²_Pre value of 0.892 and an RMSEP value of 0.023 in the xylem, slightly lower than its counterpart (using the cortex), with R²_Pre = 0.854and RMSEP = 0.030. The RC-LS-SVM results showed an R²_Pre value of 0.933, an RMSEP value of 0.022, and an RPD of 2.27, while its counterpart had an R²_Pre value of 0.883 and an RMSEP value of 0.026, and an RPD of 1.8

For the SPA-PLSR model, the R²_Prevalue was 0.896, the RMSEP value was 0.023, and an RDP of 2.17, while its counterpart had an R²_Pre value of 0.815 and an RMSEP value of 0.024 and an RPD of 2.08. On the other hand, the LS-SVM model had an R²_Pre value of 0.934 and an RMSEP value of 0.022, with an RPD of 2.27, and its counterpart had an R²_Pre value of 0.893 and an RMSEP value of 0.024 and 2.08 for the RPD. Regarding prediction accuracy for Car content in both the calibration and prediction sets, the outcomes demonstrate that the LS-SVM models exhibited superior performance overall than the PLSR models. In contrast to the PLSR models, the LS-SVM models demonstrated superior RMSEC and RMSEP values.

The LS-SVM model exhibited notably robust outcomes for the Xylem region among the cultivars, whereas Fz maintained a consistently high performance across all models and regions. Conversely, the cultivar Pt obtained superior performance from the PLSR model in the Cortex region. As listed in Table 5, the Pt samples exhibited the identical pattern identified in the Fz sample. Superior suitability for Car prediction was observed in the xylem spectra compared with the cortex.

Figures 5 and 6 illustrate the optimal prediction outcomes according to the selected-range spectra.

Discussion

Analysis of characteristic wavelengths

Here, the spectral window ranged from 400 to 1000 nm. The outcomes of the wavelength selection are shown in Table 1. The EWs for Car were determined in the xylem and cortex, ranging from 410 and 956 nm, whereas the EWs for pH contents were between 500 and 900 nm. In related research, Car pigments were observed at wavelengths between 400 and 500 nm⁵⁵, 450 nm, and 580 nm^25,26, as well as 400–600 nm⁵⁶. Additional peaks were observed at the xylem area at 820 and 980 nm and in the cortex region at 814 and 970 nm. Additionally, acids were found to be present at 800 nm⁵⁶, and sugars were detected at 835 nm²⁴ and 840 nm⁵⁷. Therefore, the peak at 820 nm in the carrot xylem and 814 nm in the cortex could be related to acids and sugars in both cases. Water was detected at 960 nm⁵⁸, 970 nm⁵⁶, and 976 nm²⁴. Hence, the peaks reported at 980 nm and 970 nm may be attributed to water and sugars. Conversely, water and sugars have been observed at wavelengths of 970 nm^59,60, 960–980 nm⁶¹, and 970–980 nm⁶², respectively. In particular, some minor differences in wavelength reflectance were observed in the xylem and cortex.