Spectroscopic analysis of chia seeds

Chia seeds are becoming more and more popular in modern diets. In this contribution NIR and 2D-fluorescence spectroscopy were used to determine their nutritional values, mainly fat and protein content. 25 samples of chia seeds were analysed, whereof 9 samples were obtained from different regions in Kenya, 16 samples were purchased in stores in Germany and originated mostly from South America. For the purchased samples the nutritional information of the package was taken in addition to the values obtained for fat and protein, which were determined at the Hohenheim Core Facility. For the first time the NIR and fluorescence spectroscopy were used for the analysis of chia. For the spectral evaluation two different pre-processing methods were tested. Baseline correction with subsequent mean-centring lead to the best results for NIR spectra whereas SNV (standard normal variate transformation) was sufficient for the evaluation of fluorescence spectra. When combining NIR and fluorescence spectra, the fluorescence spectra were also multiplied with a factor to adjust the intensity levels. The best prediction results for the evaluation of the combined spectra were obtained for Kenyan samples with prediction errors below 0.2 g/100 g. For all other samples the absolute prediction error was 0.51 g/100 g for fat and 0.62 g/100 g for protein. It is possible to determine the amount of protein and fat of chia seeds by fluorescence and NIR spectroscopy. The combination of both methods is beneficial for the predictions. Chia seeds from Kenya had similar protein and lipid contents as South American seeds.

www.nature.com/scientificreports/ apparent amylose contents of milled rice with a standard error of prediction for protein of 0.138% and 1.05% for apparent amylose. Protein in flour samples was also predicted based on near infrared reflectance 21 . Vibrational spectroscopy, infrared and Raman spectroscopy have been introduced to chia in food applications, such as chia oil emulsion gels for the production of sausages 22,23 . Infrared radiation is divided into far-, medium-and near infrared. It is located between the wavelength regions of visible light as the lower and microwaves as the upper limit. The measuring principle is that certain molecules are excited into a higher vibrational state, by absorbing electromagnetic radiation. The absorbed energy is then converted in different vibrations of the C-H, O-H, S-H, or N-H groups, which are found throughout all foods. For this reason, NIR spectroscopy is a prominent candidate for the investigation of food properties and chosen for this investigation beside fluorescence spectroscopy.
Fluorescence spectroscopy is well known for its high substance-specific sensitivity and therefore well established in biological sciences and food research 24 . The measuring principle is based on excitation of molecules from ground states to higher electronic and vibrational states by absorbing a photon. As consequence fluorescence light can be emitted when the molecule goes back to the electronic ground state. The emitted fluorescence light has a longer wavelength than the excitation light, because the molecule in the excited electronic state relaxes faster to the vibrational ground state than to the electronic ground state. Taking this into account the sample can be analysed 25 . Different fluorophores present in food systems can therefore be measured by fluorescence spectroscopy, such as proteins, vitamins, coenzymes and chlorophyll.
Spectroscopy is a fast and easy applicable measurement method. The time-consuming part is the calibration procedure to obtain reliable chemometric models for the prediction for the required variables. Usually different pre-processing methods are used to smooth the data, reduce noise and correct the baseline. Well established methods here are: the Savitzky-Golay-Filter 26 and the multiplicative scatter correction (MSC) 27 as well as the standard normal variate transformation (SNV) 28 . Partial least squares regression (PLSR), principal component regression (PCR), or artificial neuronal networks (ANN) are used in most cases to correlate the spectra with calibration data 29 .
In the present work, different spectroscopic methods were used to identify major classes of organic compounds in chia seeds followed by chemometric evaluation of spectra taken from different chia seeds. The chia seeds were obtained from diverse agro ecological zones of Kenya, sampled directly from the farms, and compared with chia seeds sold in the German markets mostly originated from South America. For the first time, the nutritional composition was evaluated by fluorescence and NIR spectroscopy in parallel, so that a fast determination is possible. The study reveals essential nutritional and chemical composition that strengthens utilization of chia seeds for human health benefits and as an important ingredient in functional food.

Materials and methods
For the study, 25 samples of chia seeds were spectroscopically examined. These include 9 samples from Kenya (named A to I) cultivated from different sites. Chia seeds were collected from different regions of Kenya, in accordance with the relevant institutional research policy, DeKUT RESEARCH POLICY, August 2016, and the national guidelines LEGAL NOTICE No. 106, THE SCIENCE, TECHNOLOGY AND INNOVATION ACT, 2013 (No.28 of 2013). The rest was obtained from different local and online markets in Germany originally from Mexico, Bolivia, Paraguay and Argentina (named J to Y, Table 1). Nutrition information according to the product packing, vendor or the distributor is presented in Supplementary Material 1.
For samples A to I, from Kenya, this information was not available, the raw fat and protein contents were therefore for all samples determined by the Analytical Chemistry module of the Core Facility Hohenheim. For the Kenyan samples the fatty acid profiles were determined as well and are presented in Supplementary Material 2. The samples for spectroscopic evaluation were ground with a centrifugal mill (ZM 100, Retsch Technology GmbH, Düsseldorf, Germany) at 6000 rpm. The seeds were frozen for at least 24 h before grinding to avoid changes due to high temperatures.
2D-fluorescence spectra were obtained with the BioView sensor (Delta Light & Optics, Hørsholm, Denmark) equipped with a standard port containing a quartz glass window and a xenon lamp. Spectra were obtained in a range between 270 and 550 nm of excitation and 310 nm and 590 nm emission wavelength with 20 nm distance steps. The resulting spectra contained in total measured intensities of 120 wavelength combinations. A fivefold measurement was performed for each sample. The vial was briefly mixed by shaking after each spectrum recorded.
NIR spectroscopy measurements were performed in the Multi-Purpose NIR Analyzer (Bruker Optik GmbH, Ettlingen, Germany), varying wavelengths from 800 to 2800 nm (wavenumbers from 3599 to 12,489 cm −1 ). The flour samples are filled into suitable vials and placed on the reflection position of the NIR spectroscope. A fivefold measurement is performed for each sample. The vial is briefly mixed by shaking after each spectrum recorded.
The evaluation of the spectra was performed with Matlab R2019a (version 9.6). The evaluation was performed for all purchased samples (dataset DS 1), all Kenyan samples (dataset DS 2) and all samples together (dataset DS 3).
The NIR and fluorescence spectra were evaluated individually and also together as a combination dataset. The spectra were pre-processed with different methods to extract the desired information. For pre-processing variant 1 (PP1) standard normal variate transformation (SNV) was performed. For pre-processing variant 2 (PP2) a baseline correction was performed by removing the low frequency parts of the spectra. This was done by smoothening the first derivate of the spectra with a moving average filter (window width: number of points in spectrum divided by 20), then integrating the smoothened first derivate and subsequently subtracting it from the original spectrum. This was applied prior to SNV to NIR spectra. For pre-processing variant 3 (PP3) the fluorescence spectra were additionally multiplied with a factor of 0. 25  www.nature.com/scientificreports/ Where no measurement values for a particular variable were available, the samples were ignored/left out before applying the Principal Component Analysis (PCA) and Partial Least Squares Regression (PLSR). A PCA with 10 principal components was performed. The offline values were then correlated with each of these first 10 principal components subsequently to check whether there are correlations in the datasets to the target values (fat and protein content). The datasets were used for PCA and PLSR evaluations. 1 up to 32 principal components were tested for the PLSR model. For all datasets, single and combined spectra, a cross-validations (CV) 30 were carried out and the coefficient of determination R 2 and the root mean square error of prediction RMSEP (absolute error) were calculated. Furthermore, the RMSEP was calculated with respect to the range of the sample values and is named RMSEP range (percentage error).

Results and discussion
Chia seeds from Kenya have similar contents of fat and protein (compare Table 1; Supplementary Material 1) as the South American seeds. Table 1 shows single determinations of protein and fat contents, they range between 18.4-24.7% for fat and 31.5-35.8% for protein for the Kenyan samples which is within the range of the determined values for the ones of Middle and South America. The fatty acid composition is also in the range of the sample from Bolivia, which additional was evaluated as reference (compare Supplementary Material 2). It was expected that the spectra will show similar results. Representative NIR and fluorescence spectra of chia seeds are presented in Figs. 1 and 2, respectively. Sample N is the only sample with white surface of the seeds. There are already differences visible, but they might be due to the inhomogeneous surfaces for both measurement methods. Three different variants of pre-processing were tested. A simple SNV transformation was performed first, so that the spectral data are not too much modified. The results were not satisfying, so that two other preprocessing methods were tested. Raw and pre-processed (all variants) combined fluorescence and NIR spectra are presented in Fig. 3.
The best correlation results of the principal components of the fluorescence or NIR spectra respectively with the measured data are presented in Table 2. Using only fluorescence spectra, coefficients of determination of less than 0.5 were obtained. For evaluated fluorescence spectra with data set DS3 the best results were obtained with no pre-processing with R 2 = 0.27 for fat and R 2 = 0.34 for protein, for data set DS1 and DS2 the results were worse. So, no obvious correlation between fat or protein content and the fluorescence data could be found.
As NIR spectroscopy is well established for protein and fat determination in food, the coefficients of determination were better here. For NIR PP2 lead to best results for DS1 with R 2 = 0.72 for fat and R 2 = 0.6 for protein. For DS2 and DS3 there were still correlations possible with PP2. Applying a baseline correction to the NIR spectra Table 1. Information about growing region and determined values for fat and protein of African (A-I) and purchased (J-Z) chia seeds. Single determinations performed by the Analytical Chemistry module of the Core Facility Hohenheim. www.nature.com/scientificreports/ lead therefore to a small improvement of the results compared to PP1 where only a SNV was performed. As expected, the correlations for combined spectra are worse than the solely NIR, but better as the solely fluorescence spectra correlations. The PCA proofed that correlations can be found. The best results of the cross-validated PLSR models are presented in Table 3.
The evaluation of only fluorescence and combined spectra obtained best results for the PLSR prediction with PP1 for fat and protein for DS1. For DS2 PP1 was found to be the best pre-processing method for solely fluorescence and NIR evaluations too, but the results were improved by combining fluorescence and NIR spectra. For fat PP1 remained to be the best (R 2 = 0.92), but for protein PP3 was found to be best (R 2 = 0.97). The  www.nature.com/scientificreports/ combined evaluation improved the results for DS3 too compared to the poor single evaluations of fluorescence spectra which achieved only R 2 of 0.61 for fat and 0.72 for protein with PP1 and R 2 for fat (R 2 = 0.82) and protein (R 2 = 0.88) by NIR by PP2. The combination resulted in R 2 of 0.85 for fat (PP3) and 0.91 for protein (PP1). The best PLSR prediction results for the determined fat and protein contents are depicted in Fig. 4. Taking the given nutritional values of the manufacturers/vendors of the chia seeds into account the best prediction results were obtained for saturated fatty acids and dietary fibre (R 2 = 0.97) for combined spectra as presented in Table 4. Individual spectra evaluation for fluorescence lead to good results R 2 > 0.9 for energy (kcal), fat and saturated fatty acids whereas for NIR lead to R 2 > 0.8 for dietary fibre and protein. The nutritional values given by the distributors are average values, which are not determined and changed for every charge, so it is comprehensible, that the prediction results are worse compared to the values determined for the other samples. It was proven that the prediction of nutritional values for Chia seeds is possible by fluorescence and NIR spectroscopy, and the combination of both methods improved the results. However, increasing the range of nutrient diversity or selecting samples with higher variation could improve the prediction results.

Conclusion
The presented results show that combined evaluation of NIR and fluorescence spectra is suitable to predict nutritional values of chia seeds. The best prediction results were, as expected, obtained for fat and protein with combined spectra. The RMSEP for fat was 0.51 g/100 g and for protein was 0.62 g/100 g (8.98% and 9% respectively calculated with respect to the sample range) for all samples. For Kenyan samples only, the best prediction errors were 0.13 g/100 g for fat and 0.19 g/100 g for protein (2.99% and 2.97% respectively calculated with respect to the sample range). For only purchased samples the errors were 0.32 g/100 g for fat and protein (6.13% calculated with respect to the sample range). For the nutritional values given by the distributors of the purchased chia seeds, the prediction results for fat, thereof saturated fatty acids and protein were the best with prediction errors below 0.7 g/100 g (calculated with respect to the sample range below 10%), which is found to be good compared to the other values. Further studies are necessary to improve the prediction qualities. It is expected that if the range of nutritional and chemical composition of the samples would be increased, the prediction error will be reduced. Furthermore, alternative pre-processing and evaluation methods might lead to better results too. However, due to the fast determination of the nutritional and chemical composition of the samples using the spectroscopic method, it is a promising alternative to the current standard methods. Table 2. Two best correlation results of the PCA for no and pre-processing methods for the single and combined spectra as well as all datasets (DS). R 2 is the coefficient of determination, RMSEP is the absolute root mean squared error of prediction, RMSEP range is the percentage error. PP indicates pre-processing, PP1: SNV, PP2: SNV and baseline correction, PP3: SNV, baseline correction and reducing factor for fluorescence spectra. PP is for pre-processing.  www.nature.com/scientificreports/ Table 3. Results of the PLSR model prediction of fat and protein of chia seeds with single and combined spectra of fluorescence and NIR with no and the three pre-processing variations and all three datasets (DS). R 2 is the coefficient of determination, RMSEP is the absolute root mean squared error of prediction, RMSEP range is the percentage error. PP indicates pre-processing, PP1: SNV, PP2: SNV and baseline correction, PP3: SNV, baseline correction and reducing factor for fluorescence spectra. PP is for pre-processing.   www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.