Machine Learning Allows Calibration Models to Predict Trace Element Concentration in Soils with Generalized LIBS Spectra

Determination of trace elements in soils with laser-induced breakdown spectroscopy is significantly affected by the matrix effect, due to large variations in chemical composition and physical property of different soils. Spectroscopic data treatment with univariate models often leads to poor analytical performances. We have developed in this work a multivariate model using machine learning algorithms based on a back-propagation neural network (BPNN). Beyond the classical chemometry approach, machine learning, with tremendous progresses the last years especially for image processing, is offering an ensemble of powerful and constantly renewed algorithms and tools efficient for the different steps in the construction of a spectroscopic data treatment model, including feature selection and neural network training. Considering the matrix effect as the focus of this work, we have developed the concept of generalized spectrum, where the information about the soil matrix is explicitly included in the input vector of the model as an additional dimension. After a brief presentation of the experimental procedure and the results of regression with a univariate model, the development of the multivariate model will be described in detail together with its analytical performances, showing average relative errors of calibration (REC) and of prediction (REP) within the range of 5–6%.

www.nature.com/scientificreports www.nature.com/scientificreports/ Raw experimental data. Six replicate (j) spectra were taken for each pellet. A spectrum was an accumulation of 200 laser shots distributed over 10 distinguished sites on the sample surface ablated each by 20 successive laser pulses. An individual spectrum can thus be notated by → I ij t (for the j th replicate measurement on the sample with analyte concentration Co ti prepared with the soil type t). A typical replicate-averaged spectrum is presented in Fig. 1, showing in particular the emission line chosen for Ag emission intensity measurement, the Ag I 328.1 nm line.
Quantitative analysis performances with univariate calibration. Calibration curves based on univariate regression are constructed by representing the intensities of Ag I 328.1 nm line, → I ij t (Ag I 328.1 nm), as a function of the Ag concentrations of the corresponding calibration samples. As we can see in Fig. 1, this line is enough intense and seems free of interference with other lines, its intensity is still not particularly high to avoid significant self-absorption to occur. A linear regression of the line intensities as a function of the Ag concentrations for a given soil type results in a soil-specific calibration curve for each of the 4 analyzed soils as shown in Fig. 2. The error bars in the figure are standard deviations ( σ ± I i t) of the intensities calculated for the 6 replicate measurements performed for each sample pellet. Large error bars associated to the line intensities indicate the effect of the emission source noise as we discussed above. The same noise leads to the reduced values of the determination coefficient r 2 with respect to the unit for each of the 4 soils. In addition, the slopes of the calibration curves are significantly different, showing an obvious matrix effect in LIBS analysis of the 4 soils. By merging the line intensities from the calibration samples of the 4 types of soil, a soil-independent calibration curve can be established. For this purpose, the intensity data from the 4 soils are plotted as a function of Ag concentrations in a same figure as shown in Fig. 3, and a linear regression of the data leads to the soil-independent univariate calibration curve. We can see in this figure that a large dispersion of emission intensities for samples with a given Ag concentration due to the matrix effect leads to a much reduced r 2 value for the soil-independent calibration curve.
Line intensities from the validation samples in Table 1 are then used to evaluate the accuracy and the precision of prediction using the established calibration models. These intensities are represented by crosses in Figs 2 and 3. Table 2 sums up the figures of merit 35,40 of quantitative analysis performances using the univariate model with both the soil-specific and the soil-independent calibration curves, where REC(%) is average relative error of calibration, REP(%) average relative error of prediction, RSD(%) relative standard deviation of the predicted concentrations, and LOD(ppm) limit of detection. The definitions of the above quantities are given in the section "Methods".  www.nature.com/scientificreports www.nature.com/scientificreports/ We can see in Table 2 that the soil-specific calibration curves have fair r 2 values, while their slopes are significantly different, as also shown in Fig. 2, indicating significant influences of both emission source noise and matrix effect. The accuracies of calibration and prediction, indicated respectively by REC (mean value 18.28%) and REP (mean value 37.07%) of the soil-specific calibration curves are not satisfactory for quantitative analysis. This is due to a limited measurement repeatability from one sample to another. On the other hand, a limited repeatability of replicate measurements leads to an unsatisfactory prediction precision (mean value of RSD = 42.35%), as well as a quite high LOD (mean value of LOD = 24.23 ppm), compared to standard LIBS measurement performances for solid samples.  When the soil-independent calibration curve is considered, we can find a degraded r 2 value, indicating a significant influence of matrix effect. The determined slope of the calibration curve logically corresponds to the average value of the slopes of the 4 soil-specific calibration curves. We can also see that the mentioned matrix effect also degrades the accuracy of calibration (REC) due to a larger dispersion of the line intensities participating in the construction of the soil-independent calibration curve. The degraded calibration accuracy becomes comparable to the prediction accuracy (REP) when all the soil types are included for model validation, since both of them are directly influenced by the matrix effect. Due to the same influence, the prediction precision (RSD) with all the soil types is degraded compared to the mean value of the soil-specific calibration curves. At the same time, the limit of detection (LOD) does not record significant change with respect to the soil-specific calibration models, since it is more sensitive to the fluctuation of replicate measurements due to the emission source noise.

Analytical Performances with Multivariate Calibration Model
Principle of the developed multivariate calibration model. From the above results with univariate calibration models, we can see that the analytical performances are affected by the both matrix effect and emission source noise. The developed multivariate model is therefore designed to deal with different types of soil, and in the same time, such model should efficiently reduce the dispersion of analytical results due to any change and fluctuation of experimental condition. The idea is to explicitly include the information about soil type in the input variables for the training and validation of the calibration model. More specifically, for a given reference sample with known analyte concentration Co ti prepared from the soil type t, the j th replicate LIBS measurement generates a spectrum which can be presented as a vector in the form of → = … …  I  I I  I  I  ( , ,  ,  )   ij  t  ij  t  ij  t  ijk  t  ijM  t  1  2 1 , where M 1 is the dimension of the spectrum (pixel number of a raw spectrum or the number of contained intensities in a pretreated spectrum). Such physical spectrum can be concatenated with an ensemble of M 2 variables, 2 ), representing the properties of the sample. The result is a generalized spectrum with M 1 + M 2 generalized intensities, Such spectrum is considered in the method as a vector in a hyperspace of , can thus be attributed to it for formally representing the concentration of the analyte (silver for instance) in the corresponding sample. Such module cannot be calculated using a simple mathematical function. The physical correlation between the generalized spectrum and the concentration of the analyte can only be expressed as a mathematical relation of mapping: In our experiment, a machine learning algorithm is used through a training process, to establish the mapping between the collection of generalized spectra and the ensemble of element concentrations of the corresponding reference samples. The result of such training process leads to a calibration model which is able to predict the concentration of the analyte in a validation sample when its generalized spectrum is used as the input of the model. The physical basis of the mapping between the generalized spectra and the elemental concentrations is the interaction between the different species in a laser-induced plasma, which leads to the correlation of the concentration of a specific element contained in the plasma to the whole plasma emission spectrum. In our experiment, tests have been performed to establish the importance of the additional dimension in a generalized LIBS spectrum for the training of the model and the effectiveness of this dimension for the correction of the matrix effect. More detailed information is provided in the section "Methods". www.nature.com/scientificreports www.nature.com/scientificreports/ Implementation of the method: flowchart of training and validation of the calibration model. Figure 4 shows the flowchart of the developed multivariate calibration method. Several steps can be distinguished in a successive way.
Step 1. Data set organization, pretreatment and formatting. The experimental data are organized in this step in the way shown in Table 3, where we can see that for each soil type, 6 pellets with different analyte concentrations are selected for the calibration sample set (i ∈ {1, 2, 3, 5, 6, 7}), and the rest one (i ∈ {4}) for the validation sample set. In order to have a clear vision of the structure of the experimental data, they are presented within a rectangular parallelepiped as shown in Fig. 5. An individual raw spectrum, → = … … ( ) , is represented in the rectangular parallelepiped by a cube with a set of given values of (t, i, j), here the index k is used to indicate a pixel in the spectrum: 1 ≤ k ≤ M 0 , M 0 = 21915 is the pixel number of a raw spectra, which physically corresponds to the spectral range of the used spectrometer, 220 nm ≤ λ ≤ 850 nm.
Pretreatment is performed on the raw spectra, which consists in (i) normalization and (ii) feature selection. The normalization, applied to all the raw spectra of the laboratory-prepared reference samples, is a simple operation which transforms the intensity rang of each pixel of all the raw spectrum into the interval between 0 and 1:  where i ∈ {1, 2, 3, 4, 5, 6, 7}, I k min and I k max are respectively the minimum and the maximum of the pixel k among the same pixels of all the individual spectra (4 × 7 × 6 = 168 spectra). Such normalization reduces the contrast among the pixel intensities of a raw spectrum, which can initially exceed one order of magnitude for a large part of the pixels as shown in Fig. 1. Since one could expect smaller variations among the intensities of the different individual spectra for a given pixel, unless a physical reason, variation of the analyte concentration for example, makes them to change in a correlated way. After the normalization, all the pixels of an individual spectrum, whatever their initial physical intensities, should contribute in a more statistically equivalent way, to characterize it with respect to the other ones.
Feature selection is performed by applying the SelectKBest algorithm 41 to the normalized spectra of the calibration sample set. The principle consists in selecting and keeping in an individual spectrum for the further processing, pixel intensities with high enough correlation with the series of analyte concentrations of the calibration sample set. Such correlation is calculated in the algorithm with a score function Score(k 0 ): , is the number of the individual spectra in the calibration sample set, I k norm 0 stands for the mean value of normalized intensity of the pixel k 0 (hence the corresponding wavelength) with respect to the measurement replicates, the soil types and the prepared concentrations of the calibration samples; and Co refers to the mean value of the prepared concentrations of the calibration samples. In the case of model training with a given type of soil, the above sums with respect to t reduce to a single corresponding term. The threshold value applied to Score(k 0 ) for feature selection takes into account the number of individual spectra www.nature.com/scientificreports www.nature.com/scientificreports/ included in the calibration sample set, for instance  = × × = 6 4 6 144. In this work, 150 pixels were selected over the initial 21915 ones, so that the reduced normalized spectrum have a dimension of M 1 = 150. Such dimension is comparable to the total number of spectra used in the calibration sample set, reducing thus the risk of overfitting.
The spectrum of selected features is shown in Fig. 6. We can see that pixels (or equivalently wavelengths) receiving the highest scores are concentrated in the spectral range from 327 nm to 339 nm as shown in Fig. 6(b), and correspondingly in Fig. 6(a). We can identify 2 neutral silver lines: Ag I 328.1 nm with a NIST relative intensity 42 of 55000 and Ag I 338.3 nm with a smaller NIST relative intensity of 28000. The experimental spectrum shows however I(at 338.3 nm) > I(at 328.1 nm). A detailed inspection in the NIST Atomic Spectra Database 42 shows the presence of a relatively intense titanium ionic line, Ti II 338.4 nm line, with a NIST relative intensity of 7100. Since titanium represents an important trace element in soil, this line can therefore significantly interfere with the Ag I 338.3 nm line. This is why the pixels in the Ag I 328.1 nm line receive higher scores, while a part of the pixels corresponding to the Ag I 338.3 nm line receive lower scores, and those pixels are all situated in the low frequency side of the intensity peak around 338.3 nm. A second zone where high score features are found extends from 750 nm to 850 nm, where we can remark the correspondence between the selected features and the lines emitted by oxygen and nitrogen atoms, which are mainly contributed by the ambient gas. Correlation between the elements from the ambient gas (especially O and N) and an element to be detected in the sample has been studied in our previous work 33 . Clear physical interpretation of the selected spectral features demonstrates the significance of the used SelectKBest algorithm.
In our experiment, the type of soil is the only significant information which distinguishes the 4 soils (with the same preparation procedure), it is thus concatenated with the normalized and reduced spectrum to form generalized spectrum: are arbitrarily chosen for representing the 4 soil types, N1, N2, U1 and U2 respectively.
Step 2. Model initialization. Back-propagation neural network (BPNN) 43 is chosen in this work to provide the algorithm which maps the generalized spectra and the corresponding analyte concentrations. Such choice is motivated by the fact that BPNN corresponds rather to an algorithm in machine learning than a specific neural network (NN). The back-propagation procedure (i.e., the application of the chain rule to calculate derivatives of composite functions) allows training a neural network using gradient-type optimization algorithms such as stochastic gradient descent (SGD), which is one of the most successful and widely used training algorithms in machine learning. The number of hidden layers n_layers, the number of nodes in a hidden layer n_nodes, the learning rate and the maximum epochs of BPNN are selected as the externally adjustable parameters to optimize the performance of the model. The model starts with its default state denoted by f (0) . The internal loop (Step 3′ in Fig. 4) is devoted to train the algorithm in such way for an input individual generalized spectrum , the resulted generalized module becomes as close as possible to the targeted analyte concentration Co ti . In considering the statistical equivalence and experimental fluctuation among the replicate measurements (j is a dummy index) and the matrix effect due to different soils, and in order to fulfill the requirements for the model to tackle both the experimental fluctuation and the matrix effect, the training process is implemented in the following way: (i). Randomly permuting among j of all the data columns of given t and Co ti , in order to randomly and independently fix the arrangement of all the 24 columns of replicate spectra as shown in Fig. 7a, with the arrangements visible for all the columns in the two surfaces of t = U2 and Co ti = 800(840) ppm of the date cube; (ii). For one of the data configurations (in total (6!) 24 possible and statistically equivalent ones) generated in the above way, performing a dynamic cross validation training process of 6 iterations. In each of these iterations, successively one layer of the data, for example the top layer, then the second, then the third…, up to the bottom one, is considered as the test data set, while the rest as the calibration data set as shown in Fig. 7b. In such iteration, the algorithm corresponding to a training model, f (1) , is trained, with the calibration data set, in order for the output generalized modules of the individual generalized spectra to be as close as possible to the silver concentrations of the corresponding targets. These iterations generate 6 different BPNNs. (iii). In the end of the above 6-fold iterative training and cross validation process, another randomly and independently arranged data configuration is generated for a new 6-fold iterative cross validation training of the algorithm. In the experiment, we fixed the considered number of randomly and independently arranged data configurations to 10, because a larger number of data configurations would not significantly enrich useful information that we can extract from the given ensemble of raw experimental spectra. In the end of the 10-data-configuration training, 60 different BPNNs are generated. (iv). The average relative error of calibration (REC) is calculated. If the value is larger than the fixed threshold, the process goes back to the training step of f (1) . Otherwise a calibration model for test, f (2) , is generated. (v). f (2) is then tested by the test data set in a similar way as the above training process. The average relative error of test (RET) is calculated. (vi). The resulted REC and RET are compared to the fixed threshold values. If they, or one of them, are larger than the threshold value(s), the process goes to the external loop. Otherwise a calibration model for validation, f (3) , is generated.
In this experiment, the threshold values were fixed for 10-data-configuration resulted REC and RET both at 5.50%. This value was chosen to minimize the average relative error of prediction (REP) calculated in the validation process of the calibration model for validation, f (3) , using generalized validation spectra, which were not involved in the model training process. Numerical experiments were thus necessary to determine these thresholds, even though the values could be intrinsically smaller if only the model training process in the step 3 is concerned.
The detailed definitions of REC, RET and REP for the assessment of the multivariate model are given in the section "Methods".
Then only one type of soil is under consideration, t takes a fixed valued among N1, N2, U1, U2.
The external loop of this step is aimed to optimized the externally adjustable parameters of the algorithm, BPNN for instance. The used method is grid-search parameter tuning, which is known as an efficient method of optimization for constructing a calibration model. In this method, for given ranges of the selected adjustable parameters, the performance of the model is evaluated for all the possible combinations of the adjustable parameters in an exhaustive way. The combination generating the best performance is retained. In our experiment, the ranges of the 2 externally adjustable parameters, n_layers and n_nodes, both positive integer, were respectively fixed being 1 to 2 and 3 to 8, 12 combinations were therefore evaluated.
When the values of REC and RET are simultaneously smaller than 5.50%, the iteration in the external loop stops. A calibration model for validation f (3) is generated as the output of the step 3. In our experiment, f (3) is obtained with the externally adjustable parameters of n layers = 1, n nodes = 5, learning rate = 0.2, and maximum epochs = 10000. The fact that the final BPNN structure is optimized with a single hidden layer corresponds well to the universal approximation theorem which states that a BPNN with 3 layers (the input, the hidden and the output layers) under mild assumptions on the activation function is enough for fitting any continuous function over a finite dimension compact set [44][45][46] .
Step 4. Model validation with an independent validation sample set. The output model of the step 3, f (3) , is validated in this step using generalized validation spectra obtained from the validation samples which is not involved in the model training process. Average relative error of prediction (REP) and average relative standard deviation (RSD) are calculated for individual generalized validation spectra of the validation sample set to respectively evaluate the prediction accuracy and precision of the model. After the validation, the final calibration model, f, is generated with the corresponding REP and RSD, indicating its performances.

Results and Discussions
Soil-specific and soil-independent calibration curves are respectively shown in Figs 8 and 9. We use here a similar presentation as in Figs 2 and 3 to easy the comparison with the univariate models. And the parameters showing the analytical performances of the multivariate models are presented in Table 4.
We can see that the soil-specific calibration curves exhibit all a r 2 value very close to the unit. This means that the multivariate models efficiently reduce the experimental fluctuation from a reference sample to another. The fluctuation from a replicate measurement to another for a given sample is also significantly reduced, which leads to a very small error bar on each predicted concentration. A direct consequence of such reduced fluctuations is a significant improvement of LODs from several tens ppm to around ppm. In coherence with the high r 2 values, the calibration accuracy is greatly improved and reaches now an impressive level of around 1% for REC and RET. The prediction capacity of the soil-specific calibration models is clearly reinforced with order-of-magnitude reduction for both REP and RSD compared to those of the univariate model. Such performance clearly fulfills the requirements of precise and accurate quantitative analyses.
When the calibration spectra from all the soils are used to build a calibration model, the soil-independent calibration curve is obtained as shown in Fig. 9. We can see a r 2 value very close to the unit as in the case of soil-specific multivariate calibration curves. This does not only mean an efficient improvement of the repeatability from a sample to another for a given type of soil, but more importantly shows the ability of the multivariate model to take into account the specific matrices between the different soils and to reduce the matrix effect. In fact, the data from the different types of soil can be fitted with a unique linear model with a determination coefficient r 2 very close to those of soil-specific calibration curves. The LOD allowed by the soil-independent multivariate model remains quite low in the order of 5 ppm. A slight increase of this value with respect to those of soil-specific calibration curves would indicate a residual matrix effect. The same residual matrix effect should contribute to slightly reduce the calibration accuracy compared to the soil-specific calibration curves, as indicated by the values of REC and RET in the order of 5% for the soil-independent model. Compared to the univariate model, the performance of the calibration curve is greatly improved with a matrix effect reduced within an acceptable level.
Concerning the prediction capacity of the multivariate soil-independent model, great improvements can be observed with respect to the univariate model for the accuracy as well as for the precision, although degradations are observed compared to the multivariate soil-specific models. Such degradations should be related to the above mentioned residual matrix effect, which would lead to, sometimes, unexpected large values of REP and RSD when Scientific RepoRtS | (2019) 9:11363 | https://doi.org/10.1038/s41598-019-47751-y www.nature.com/scientificreports www.nature.com/scientificreports/ the model is validated by the spectra from specific soils, which is the cases for the validations with N1 (specified with an informative Ag initial concentration of 40 ppm weight from NIST) with a large REP and N2 with a large RSD. Nevertheless, when the model is validated by the validation spectra of all the soils, the prediction capacity exhibits an excellent level, as indicated by the values of corresponding REP and RSD in the range of 5-6%, which is order-of-magnitude improved compared to the univariate model. The degradations observed for the soil-independent model with respect to the soil-specific models seem suggesting possible improvements with a better correction of matrix effect, which might need an enlarged number of soil types used for the training of the multivariate model.  In this work, a multivariate calibration model has been developed with machine learning algorithms. The purpose is to strength the quantitative analysis ability of LIBS by efficiently correcting fluctuations due to the emission source noise and deviations due to the matrix effect. In an application case as important as soil analyses, such fluctuations and deviations prevent a univariate calibration model from being sufficient for precise and accurate quantitative analysis of the contained trace elements. The multivariate calibration model has been therefore designed for taking into account the specificities of different soils and in the same time, efficiently reducing data dispersions due to experimental fluctuations. A key point is to introduce the concept of generalized spectrum, in which the information about sample matrix is explicitly included. BPNN has been used to map a generalized spectrum to the corresponding analyte concentration. A training process, including data pretreatment, model initiation, model training loops and model validation, has been implemented within the framework of Python programing language. In the data pretreatment, a feature selection with the SelectKBest algorithm reduces the dimension of a spectrum to a value compatible with the number of the raw spectra, limiting thus the risk of overfitting, and in the same time efficiently extracts the most significant features for characterizing the spectrum. The resulted multivariate model shows great improvements with respect to the univariate one. The fluctuation over the replicates is efficiently reduced, leading to very small error bars on the predicted concentrations. The improvement of sample-to-sample repeatability for a given soil type further allows the soil-specific calibration curves exhibiting a r 2 value exceeding 0.9999, a calibration accuracy reaching 1% level, and a LOD being down to the order of ppm. When being validated by independent samples, the prediction capacity of the soil-specific models presents high performance in terms of accuracy (mean REP = 2.59%) as well as precision (mean RSD = 2.04%). When the soil-independent calibration model is considered, the result of matrix effect correction is impressive with order-of-magnitude improvements with respect to the univariate model. Thereby, the accuracy and the precision of the predictions are both improved into the range of 5-6%. Our works have demonstrated the effectiveness and the advantage of applying machine learning to treat LIBS spectra of soils. The perspective to generalize the developed method to LIBS analysis of other materials, and furthermore to other spectroscopies is certainly worth to be mentioned here. Such generalization indeed allows spectroscopic techniques benefiting from the tremendous progresses realized today in machine learning, and opens wider application perspectives.

Soil samples and their preparation.
Four different soils were analyzed in the experiment. Two of them were standard reference materials (SRM) from National Institute of Standards and Technology (NIST): NIST 2710 (https://www-s.nist.gov/srmors/view detail.cfm?srm = 2710a) and NIST 2587 (https://www-s.nist.gov/ srmors/view detail. cfm?srm = 2587), and respectively named as N1 and N2 in this work. The other 2 soil samples were collected from 2 different places near Lyon in France, one near a river (sand-like soil) and another in an agriculture field (yellow colored soil) with unknown elemental compositions and named as U1 and U2 in this work. The 2 NIST samples were provided in fine and uniform powder of particle size <75 μm (200 mesh). The 2 collected samples were first dried, separated from small stones and organic materials, ground and then sequentially sieved through stainless steel sieves of 100, 200 and 400 mesh, assisted by an electromagnetic vibratory shaker, finally resulting particles with sizes of <38 μm. In each type of soil powders, silver (Ag) as analyte was added in different concentrations, by mixing the soil powders with Ag solutions at different concentrations obtained by dilution with deionized water of an Ag standard solution (2% nitric acid solution at an Ag concentration of 1000 mg/L from SPEX CertiPrep). Notice that the initial content of Ag in the collected soils was negligible (under the limit of detection of the experimental setup). For the 2 NIST samples, the N1 sample was specified with an informative initial silver concentration of 40 mg/kg (40 ppm weight). For the N2 sample, there was no specification of silver concentration. Doped powders were prepared in pellets of different soils and different Ag concentrations. For the preparation of a pellet, 0.2 g soil powder was pressed without binder under a pressure of 667 MPa (6.8 t/cm 2 ) for 5 min to form a pellet with a diameter of 13 mm.   www.nature.com/scientificreports www.nature.com/scientificreports/ Experimental setup and measurement protocol. The experimental setup used to produce the LIBS spectra has been described in detail elsewhere 33,47 . The following experimental parameters were used for the spectrum acquisition in this experiment: laser wavelength 1064 nm; laser pulse energy 60 mJ; diameter of the focused laser spot on the sample surface ~300 µm, estimated laser fluence on the sample surface 85 J/cm 2 and ablation under the atmospheric ambient. The emission from a zone around the symmetry axis of the plasma situated at a height of 1.3 mm from the sample surface was captured and coupled to an Echelle spectrometer (Andor Technology Mechelle 5000). The spectral range of the spectrometer was 220-850 nm, with a resolution power of λ/Δλ ≈ 5000. The intensified CCD camera (ICCD) coupled to the spectrometer was triggered by laser pulse and set with a delay of 1 µs and a gate width of 2 µs. A gain of 60 (maximum 250) was applied of the intensifier of the ICCD for all the measurements. For each sample pellet of given Ag concentration of each type of soil, 6 replicate spectra were taken. Each spectrum was an accumulation of 200 laser shot distributed over 10 sites ablated each by 20 consequent laser pulses. Between 2 neighbor ablation sites, a translation stage displaced the pellet over a distance of 600 µm in order to avoid overlapping between the sites.
Assessment of univariate calibration model. For a given type t among the T types of soil (T = 4 in this experiment), an ensemble of laboratory-prepared reference samples with different analyte concentrations are separated into a calibration sample set and a validation sample set 35 • Determination coefficient r 2 (the square of the correlation coefficient r), a usual criterion of the performance of a calibration model: res tot 2 where SS tot is the sum of squares of the experimental intensities corrected by their mean value, SS res is the sum of squares of the residuals with respect to the calibration model: where S = {1, 3, 4, 5, 7} referring to the calibration sample set.
• Average relative error of calibration REC(%) for calibration accuracy evaluation: a where σ a is the standard deviation of a, such variation is due to the dispersion of I ij t (Ag I 328.1 nm). LOD is thus determined by the sensibility of the technique (the slope b) and the repeatability and precision of intensity measurements among the different reference samples and different replicates for given samples (standard deviation of a, σ a ).
In the case of consideration of a specific soil type, the variable t takes the corresponding given value and the concerned sum reduces to a specific term in the above definitions.
Assessment of multivariate calibration model. In the experiment, the multivariate calibration model, in its different training stages f (q) , allows deducing a predicted analyte concentration  Co tij q ( ) when an individual generalized spectrum, , of a sample with a laboratory-prepared analyte concentration Co ti (targeted concentration) is used as the input variable: The following parameters are defined to assess the performance of the multivariate model: • Average relative error of calibration REC(%): is the predicted concentration corresponding to the targeted concentration Co ti by f (1) in a given iteration for a given randomly and independently arranged data configuration (o, p): is the mean predicted concentration by f (1)  is the predicted concentration corresponding to the targeted concentration Co ti by f (2) in a given iteration for a given randomly and independently arranged data configuration (o, p): ( , )  Co ti (2) is the mean predicted concentration by f (2) with respect to the laboratory-prepared reference concentration Co ti .
• Average relative error of prediction REP(%): • Limit of detection LOD(ppm), deduced by fitting with a straight line, the predicted concentrations  Co ijt by f for the calibration sample set: ij t General tij , where i ∈ {1, 2, 3, 5, 6, 7} referring to the calibration sample set, versus the corresponding prepared concentrations of the calibration sample set, Co ti : a where σ a is the standard deviation of a, such variation is due to the dispersion of  Co ijt . LOD is thus determined by the sensibility of the technique (the slope b) and the accuracy and precision of concentration prediction by the model for the different reference samples and different replicates for a given sample (standard deviation of a, σ a ).
In the case of consideration of a specific soil type, the variable t takes the corresponding given value and the concerned sum reduces to a specific term in the above definitions.

Back-propagation neuronal networks (BPNN).
A single hidden layer BPNN used in this work consists of an input layer, a hidden layer, and an output layer as shown in Fig. 10. The tanh function is used as the activation function of the hidden layer. The Stochastic Gradient Descent (SGD) 43 and Mini-batch Stochastic Gradient Descent (MSGD) 48 Table 4, we can observe a slight degradation of the calibration accuracy when the original LIBS spectra is used for model training. In contrast, an important degradation is observed for the accuracy of prediction, clearly indicating the effectiveness of the use of generalized spectra for the correction of matrix effect. In addition, we have extracted the weights applied to the outputs of the neuron in the input layer corresponding to the soil matrix information in a generalized spectrum. These outputs are respectively connected to the 5 neurons of the hidden layer. Typical values of these weights for a trained NN are 0.838435, −1.6776, −0.39414, 2.200763 and 1.503809, which is orders of magnitude larger than the mean values of the weights applied to 5 outputs of the rest of the input neurons in the same NN, with the typical values of 0.015773, −0.04353, 0.005409, −0.02946 and 0.001227. The importance of the additional dimension in a generalized LIBS spectrum related to the soil matrix is therefore clearly demonstrated together with its effectiveness for the matrix effect correction.