Quantification of salt stress in wheat leaves by Raman spectroscopy and machine learning

The salinity level of the growing medium has diverse effects on the development of plants, including both physical and biochemical changes. To determine the salt stress level of a plant endures, one can measure these structural and chemical changes. Raman spectroscopy and biochemical analysis are some of the most common techniques in the literature. Here, we present a combination of machine learning and Raman spectroscopy with which we can both find out the biochemical change that occurs while the medium salt concentration changes and predict the level of salt stress a wheat sample experiences accurately using our trained regression models. In addition, by applying different machine learning algorithms, we compare the level of success for different algorithms and determine the best method to use in this application. Production units can take actions based on the quantitative information they get from the trained machine learning models related to salt stress, which can potentially increase efficiency and avoid the loss of crops.

The salinity level of the growing medium has diverse effects on the development of plants, including both physical and biochemical changes. To determine the salt stress level of a plant endures, one can measure these structural and chemical changes. Raman spectroscopy and biochemical analysis are some of the most common techniques in the literature. Here, we present a combination of machine learning and Raman spectroscopy with which we can both find out the biochemical change that occurs while the medium salt concentration changes and predict the level of salt stress a wheat sample experiences accurately using our trained regression models. In addition, by applying different machine learning algorithms, we compare the level of success for different algorithms and determine the best method to use in this application. Production units can take actions based on the quantitative information they get from the trained machine learning models related to salt stress, which can potentially increase efficiency and avoid the loss of crops.
Wheat (Triticum aestivum L.) is one of the cereal products that can quickly adapt to environmental conditions. Besides, it is one of the most widely cultivated and produced herbal products globally thanks to the high amount of protein and carbohydrates it contains 1 . Wheat is a nutritious plant that meets 20-80% of the energy and protein needs of humans 2 . Turkey produced approximately 23 million tons of wheat in 2015. This production decreased to 20 million tons in 2018.
One of the significant reasons for this decrease is climate change, which affects the weather and plant growth via stressor factors such as salinity. Salinity occurs due to the accumulation of water-soluble salts in the upper part of the soil 3 . Salinity affects the morphology, anatomy 4 , growth, root length, and osmotic pressure of the plants 5,6 .
To determine the amount of malondialdehyde, chlorophyll, and carotenoids under optimum conditions and stress factors, biochemical methods are frequently used. The most commonly practiced technique in this type of biochemical analysis is UV-visible spectrophotometry. However, this method has some limitations in preparing, measuring, and analyzing a sample, which led scientists to seek alternatives. Recent studies utilized Raman spectroscopy to reveal the effects of salinity stress on the microstructure of wheat plants.
Raman spectroscopy is a vibrational spectral technique that gives information about the samples' internal molecular structure illuminated with a coherent source. Since inelastically scattered photons change their frequencies as much as the molecules' vibrational frequencies, the peaks in the spectra have intrinsic information about the sample, such as the type and amount of molecular bond vibrations 7,8 . Raman spectroscopy has a broad use in biology due to its advantageous properties like non-invasiveness and rapidness 9,10 . Schulz and Baranska documented the use of Raman spectroscopy in plant biology with a detailed list of tentative assignments 11 . Studies have been carried out using Raman Spectroscopy on many plants, including fennel fruits, chamomile clusters, turmeric roots, untreated summer wheat leaves, wheat grains, and fresh soybeans [12][13][14][15] .
Raman spectroscopy provides rapid results with little or no sample preparation. Specifically, drought stress has already been investigated employing benchtop Raman spectroscopy 13,15,16 with the help of statistical methods.
In addition to the contributions mentioned above, the chemometric analysis helped intensely interpret the plant Raman signals [17][18][19] . Cebi et al. detected the amino acid l-Cysteine in the wheat floor using chemometric methods such as principal component analysis (PCA) and hierarchical cluster analysis (HCA) 17 18 . We employed a machine learning-supported Raman Spectroscopy approach to determine the amount of salt stress factor the plant was exposed to. The outline of the procedure of our study is shown in Fig. 1. Our study employed a range of machine learning algorithms and compared their training times, prediction speeds, and success rates.

Results and discussion
Biochemical analysis using Raman spectroscopy. To determine the biochemical change per week due to different levels of salt stress, we employed Raman spectroscopy. In Fig. 2 we compared the average of normalized Raman spectra corresponding to each medium salt concentrations (0, 50, 100, 150 mM) for each week. There was an expected general trend in the spectra regarding the concentration change.
Although this anticipated structure was observed in the period of maturity (Fig. 2c,d), there were some discrepancies in the early and late developmental periods. In the first 2 weeks (Fig. 2a,b)     grown and they showed some heterogeneity when we scanned different areas of the leaf sample. A similar but less prominent effect was also apparent in the fifth-week spectra (Fig. 2e). According to the studies, when plants are exposed to long-term salt stress, they give various responses at vegetative, physiological, and biochemical levels to adapt themselves to this situation and to continue their lives under stress conditions. These responses may differ between genotypes in short-term and long-term exposure to salt stress, as in the strawberry plant 20 .
To get a close-up of the important biochemical changes, we plotted the trend of Raman intensities regarding 522 cm −1 , 747 cm −1 , 855 cm −1 , 1515 cm −1 , and 1563 cm −1 bands (Fig. 3). As given in Table 1, these bands correspond to cellulose, pectin, serine, carotenoid, and chlorophyll b respectively 11,[21][22][23][24] . We observed that cellulose, pectin, and serine bands tend to decrease as the salt concentration of the medium increases, particularly for the period of maturity, which corresponds to weeks three to five. In week five, these bands have shown an increase for 150 mM concentration which contradicts the general trend. Oligosaccharides such as cellulose, pectin, and amino acids are important components that participate in cell wall formation. Therefore, in our study, cell wall components are negatively affected depending on the duration of the stress exposed to harmful metabolites such as reactive oxygen metabolites that occur due to salt stress 26 . The sudden increase in the fifth week suggests that the plant strives to preserve its integrity. The increase in this week suggests two possibilities. Firstly, by regulating the whole metabolism of the plant for adaptation to salt stress, it moves to a new level from the fifth week, or secondly, the measurements in the fifth week suggest that there may be an experimental error. The change in the amount of chlorophyll and carotenoid depends on the concentration of the applied salt stress and the application Table 1. Tentative assignments of Raman shift values. ν-symmetric stretching, ω-wagging, δ-symmetric bending, δ a -antisymmetric bending. We also followed the approach of Huang et al. to investigate further the signals related to vital markers 27 . We modified the approach by investigating the whole Amide I region instead of taking only 1602 cm −1 since the samples and their responses are not the same. We utilized band component analysis to reveal the sub-components of the broad peaks measured by Raman spectroscopy. We used Grams AI 8 software for this analysis, a curve-fitting process that takes the type of the line shape from the user such as Gaussian, Lorentz, or Voigt. We chose the line shape as Gaussian since it is more suitable for the broadening profile of the biological samples. In the analysis, first, we determined the number of peaks under the curve using the software's second derivative option. Then, we iterated the curve-fitting procedure until the fitting converged. The calculated peak parameters were band center position, height, width, and area. To find the changes in the applied salt level change in particular weeks, we compared peak heights and locations in the Amide region and showed them in Fig. 4.
Most of the Raman intensities shown in the figure had a common decreasing trend as the weeks progressed. On the other hand, the Raman shift values have a consistent decrease only for the bands at 1619, 1669, and 1686 cm −1 . An amide bond is a chemical bond formed between a hydroxyl group of a carboxylic group (-COOH) of one molecule and a hydrogen of an amino group (-NH 2 ) of another molecule. Therefore, amide bonds take part in the formation of the polypeptide chain that makes up the proteins. Proteins play an important role in regulating metabolism to maintain cellular integrity under stress 28 . Therefore, the changes in the bands of amide bonds detected in our study show that the plant struggles with stress. The Raman bands characterize the amino acids methionine, lysine, and tyrosine, all of which participate in the structure of proteins and can be found freely in the cell. Because these amino acids participate in the regulation of the intracellular defense system depending on the stress factors such as salt stress as well as many metabolic processes in the cell 28 . In our study, although a decrease was detected in the amount of these amino acids depending on the duration of the applied salt stress, it was determined that they increased when we evaluated the increase in the applied salt stress concentrations within themselves every week. Farhangi-Abriz and Ghassemi-Golezani 28 obtained similar results to our study on soybean. Alfosea-Simon et al., on the other hand, showed that giving amino acids such as methionine, lysine, and tyrosine exogenously to tomatoes reduced the harmful effects of salt stress in their study 29 .
The changes in the Raman shift are a sign of molecular structure change in the plant we measured. As the number of oxidative stress products increases with the increasing salt levels, the Amide I band positions 30 and the level of proteins during stress changes 31 . These changes were in line with the band components analysis results.
Quantification of the acquired Raman spectra using a wide variety of machine learning regression algorithms. To quantify the salt stress level a plant endures, we trained various machine learning algorithms. We used regression models with output in terms of medium salt concentration. The output of the models can be used to determine the level of stress a plant goes through. We used five subgroups of regression models: linear regression, Gaussian process regression (GPR), neural network (NN), support vector machine (SVM), and regression tree. We gave all these models and their respective success rates, train times, and prediction speeds in Table 2. The first 2 weeks have shown heterogeneity. Their spectra were capriciously changing when we www.nature.com/scientificreports/ scanned different areas of the same sample. Consequently, we did not perform regression analysis on the spectra of the first 2 weeks. As a measure to determine the success rate of the models, we examined R 2 and root mean square error (RMSE). Even though we did not consider training time and prediction speed when we determined the best-performing model, they are significant parameters, especially in real-time applications. Gaussian process regression (RMSE = 14.192-14.634, R 2 = 0.923-0.933) and neural network models (RMSE = 14.837-15.784, R 2 = 0.913-0.923) gave the best results. In terms of mean RMSE and R 2 , the best performing model among all was rational quadratic Gaussian process regression (RQGPR). We used the test portion of the data to determine the model's success rate. In Fig. 5a-c, we showed the RQGPR model's predictions for the test set. The median of the predictions corresponding to each true concentration value is approximately equal to the true concentration value, an indicator of a well-performing model. To make this observation more visible, we also presented residuals for each concentration in Fig. 5d-f. The medians in the residual plots are all around zero. This indicates that if the age of the observed leaf is entered in terms of weeks, the model can predict the salt level of the plant.
To test the overestimation, we took new wheat leaves from different plants from biology department members that we did not measure before. The ages and the concentrations of the leaves were mixed, and initially, we used only the current week information in the predictions. We tried our models and performed the predictions depending on the given information. In Fig. 6 we showed the residuals of the predictions when compared to the actual salt concentrations. Our predictions deviated from the true concentration by around 20 mM. www.nature.com/scientificreports/ On the other hand, to test our ability to predict the week and the salt concentrations together using the trained models, we used the untrained test samples and found the weeks corresponding to the minimum deviation from the actual value. In Fig. 6b, we demonstrate the residuals in all weeks. Since this is a second-week sample, our results are consistent with the current state of the plant. As we also predicted the salt concentration of the plant before, our developed tool can be used to find the salt levels of a wheat leave whose age is not known.
To test our tool's ability to predict an untrained group, we left the 100 mM group out of the training group and trained using others. We tested our model using the left-out group and found that the median of the prediction deviates − 25.35 mM from the expected value. Although this value is smaller for the 80/20 partitioned model where the training set includes data from all the concentrations, the value for the untrained trial is still reasonable and practical.   www.nature.com/scientificreports/ Acid), and 8 g/l Agar regeneration medium were added into the medium as a carbon source, a plant growth regulator, and as a thickener, respectively. To determine the sensitivity of wheat embryos to salt stressors, different concentrations of NaCl (50 mM-100 mM-150 mM NaCl) were added to the regeneration medium whose content was given above. The pH values of the mixtures formed were adjusted to be between about 5.8 and 6.0. The created nutrient media were sterilized in an autoclave at 121 °C at 1.2 atm pressure for 20 min and poured into sterile magendas in an aseptic environment in equal amounts 32 . For sterilization, the wheat seeds were kept in 70% ethyl alcohol for 20 seconds, passed through the sterile distilled water series 3 times, then kept in 20% commercial bleach for 20 min, and passed through the distilled water series three times to remove the bleach. Then, the seeds were incubated in sterile distilled water for 2 h in an oven at 35 °C, and swelling was allowed 32,33 . The mature seed embryos were left to swell after surface sterilization. They were separated from their endosperms under aseptic conditions and inoculated into media prepared for salt stressor application. Cultures were observed at 25±2 °C temperature for 4 weeks in a daily illumination period of 16 h light/8 h dark 32 .

Methods
Wheat leaf samples were prepared for Raman spectral measurements by cutting leaves from the plant near their tip using a scalpel. For consistency, we cut nearly the same size leaves for each sample. We measured leaves while they are still fresh.
Experimental setup and data acquisition. Leaf specimens were compressed between two slides to make sure that the region of interest of the sample will be in the focal plane during the measurement. The thicknesses of the slides were 1 mm. Prepared specimens were scanned with our hand-built motorized Raman spectral scanning setup.
We utilized a diode laser with 785 nm wavelength and 500 mW output power. The output of the laser was steered to a spatial filtering section to select the portion of the beam with the Gaussian distribution. This Gaussian beam was directed to the microscope objective (MO) by reflecting from a dichroic mirror (Thorlabs, DMLP 805) and some beam steering mirrors. The diameter of the beam was adjusted to fill the numerical aperture of the MO. The magnification and numerical aperture of the MO were 10× and 0.25. Since the working distance was greater than the thickness of the slides, we were able to illuminate the wheat leaves with this configuration. The backscattered photons follow the same path with 180° geometry until the dichroic mirror. The inelastically scattered portion of the beam (Raman beam) transmits through the dichroic mirror since the wavelength of the stokes photons are higher (we selected Stokes photons due to their higher intensity). This transmitted beam was coupled into the fiber coupling port using an achromatic lens with a nine mm focal length. The coupled photons are transferred into the Czerny-Turner type USB spectrometer (Ocean QE Pro) via a multimode fiber (NA 0.22).
We collected the spectra with two seconds of integration time. We used boxcar-width averaging with a window size of 3 pixels to denoise the spectra, which allows us to avoid averaging and saving a significant amount of time. We scanned the samples with 5 μm step sizes along the axes of the 300 μm × 300 μm region of interest. The total number of spectra was 3721 (61 × 61) for each scan. This procedure were repeated for varying regions of different leaves. Data analysis. Raman spectra pre-processing. Raman spectroscopy measurements include fluorescent profiles, as well. To analyze only the Raman peaks, one should estimate and clean this baseline profile. We determined local minimum points and applied linear fit using the Raman shift of consecutive pairs of minimum points to achieve this task. Concatenating these lines creates the total estimated fluorescence profile. We calculated this profile for each spectrum and subtracted it.
After correction, we applied an L2 normalization, also known as vector normalization. We calculated the vector norm of each spectrum and divided each element of the spectrum by this norm value. This operation provides a normalized spectrum whose vector norm equals one. It was shown that L2 normalization after fluorescent baseline correction is one of the most efficient methods in Raman spectroscopy 34 .
Outliers in the data are eliminated by just keeping the portion between quantiles 0.5% and 99.5%. After elimination 56,067 spectra were left in the data set and used in further analysis. The number of spectra left in each group is given in Table 3. Each week was treated as a separate data set and models were trained for each week separately.
Machine learning application. After pre-processing, we trained a collection of machine learning regression models, such as linear regression, Gaussian process regression, neural network, support vector machines, and regression tree algorithms. We used MATLAB (R2021a, The MathWorks, Inc., Natick) and its regression learner app provided under the statistics and machine learning toolbox. Prior to training the models, we randomly split the data into train and test sets with a ratio of 80 to 20, respectively. While training, we used five cross-validation folds. After training, we tested the model using the aforementioned test set. The best result was given by a Gaussian process regression model with a rational quadratic kernel which is both stationary and non-degenerate covariance function (Eq. 1), where r is the Euclidean distance between two values ( x, x ′ ) in the input domain (Eq. 2), l is the characteristic length scale, and α is the scale-mixture parameter.  35 . Basis function used was constant with a basis matrix H that is an n-by-1 vector of 1s, such that n is the number of observations. Initial values of the parameters were determined automatically by the regression learner app.

Conclusion
We showed that with the help of machine learning, Raman spectroscopy could be used to quantitatively determine the level of salt stress a plant is exposed to. The shifts in the Raman spectra give precise information about how much a plant struggles from the stressor factors and how close it is to dying. We also presented that models show high levels of success rates with this data; therefore, these models can be used with high accuracy. Our proposed models can quickly provide information on leaf salinity levels when used in a hand-held Raman spectrometer. Table 3. Number of Raman spectra left after quantile elimination for each week and medium concentration. License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.