Introduction

Soil health indicators are a composite set of measurable physical, chemical and biological properties which can be used to determine soil health status. Among the chemical indicators, we particularly focus on nitrogen (N) because N is the most limiting nutrient in many of the world’s agricultural areas1. Insufficient use of N causes economic loss, in contrast, excessive use of N implies wasting fertilizer, causes nitrate pollution, and increases the cost2,3. Nitrogen treatment can account for up to 30% of the total production cost4.

Chlorophyll meter (CM) measures the chlorophyll content of crops to estimate their N nutrition status. In recent years, the use of CM has increased among researchers and farmers5,6. For instance, N application rates for corn were determined using the adjusted \(R^2\) of the relationship between nitrogen rate difference (ND) and CM readings7. However, CM-based methods fail to capture the spatial variability that is often present within the field. For N management, determination of spatial patterns is necessary but requires collection and analysis of a large number of samples which is labor-intensive and time-consuming2,6.

Satellite-based remote sensing is one alternative to ground-based measurements. Satellite-based techniques utilize images at the spectral level for crop growth monitoring and real-time management8,9,10. For instance, vegetation indices (VIs), evaluated using the data obtained from satellite-based multispectral sensors, have been used to detect the N stress at V4–V7 (4–7 leaves with visible leaf collar) stages11,12,13. However, satellite-based sensing suffers from lower spatial and temporal resolution, and sensing disruption may occur during image acquisition in some areas because of cloud cover and/or sprinkler irrigation14. Farmers’ adaption of the system is still limited. Additionally, the high cost of obtaining these images for relatively small areas is a significant drawback15. Multispectral cameras mounted on unmanned aerial vehicles (UAVs) have enormous potential to resolve this problem. UAVs can be deployed rapidly and frequently for image acquisition, resulting in reduced costs, greater flexibility in terms of data resolution and mission timing16,17,18. For instance, a variable rate N fertilization map was created using hyperspectral airborne images19, the ground-sensor measurements were compared with hyperspectral images to determine the N sufficiency index17, and estimation of N side-dress using NDVI computed from aerial imaging20. However, radiometric and geometric calibrations are needed for the UAVs on-board miniaturized electro-optical sensors to obtain quantitative results and provide precise georeferencing21. UAVs also fail to perform on-board mosaicking images due to limited computational resources.

Laser induced breakdown spectroscopy (LIBS) is an analytical method for qualitative and quantitative elemental detection. LIBS can be readily applied to soil samples, providing rapid chemical analysis of soil samples and their constituents (e.g., Nitrogen, Potassium, Phosphorus, Calcium). The combination of an autonomous UA, LIBS, and machine learning can be used to achieve in-field measurement which provides instant results for deficient nutrient analysis and fertilization planning. With appropriate calibration, the LIBS analysis can provide quantitative measurement for most elements in soil including, carbon, nitrogen, potassium, sulfur, and phosphorus22,23. There have been some applications of standalone LIBS systems in precision agriculture22,23,24,25. However, there has been no detailed research of LIBS application in combination with ML and UAVs. Some studies have found it challenging to measure nitrogen using LIBS due to environmental factors; Earth’s atmosphere is almost 80% nitrogen which will interfere with the sample measurement result since the soil is less than 1% nitrogen. Testing in a vacuum or in low-pressure conditions has been suggested to improve measurement accuracy23. In this study, we conducted LIBS analysis on soil samples under a normal atmosphere for observation. Low laser pulse energies were used to minimize the breakdown of air and thereby minimize the influence of atmospheric nitrogen.

The purpose of the present study is to develop a machine learning (ML)-based predictive model to estimate TN of soil using crops and soil spectral characteristics measured from the multispectral images captured from a UAV, and LIBS. Specifically, we train a multi-layer perceptron regression (MLP-R) and support vector regression (SVR) model to predict TN in soil. We use root mean square error (RMSE) and computational time (CT) as performance metrics to measure the performance of the above predictive model. To reduce the RMSE and lower CT in the machine learning models, we perform hyper-parameter optimization (HPO). The HPO tuning process depends on the ML model used for prediction26. The traditional way to tune hyper-parameter is through manual testing, although it requires a deep understanding of the ML models27. However, manual tuning is ineffective for many problems due to a large number of hyperparameters, model complexity, time-consuming model evaluations, and non-linear hyper-parameter interactions. Several HPO techniques28 have been used for different applications such as grid search (GS), random search (RS), bayesian optimization, genetic algorithm (GA), and particle swarm optimization. In this study, we implement GS, RS, and GA for hyper-parameter optimization.

Experimental design

An aerial survey was carried out with Mavic 2 Pro UAV. We obtained multispectral images using the Sentera high-precision NDVI single sensor which was mounted on the UAV (Fig. 1a). The sensor is 1.2 MP CMOS with a 60\(^\circ\) horizontal FOV and a 47\(^\circ\) vertical FOV and works with two wide spectral bands: red (625 nm CWL × 100 nm width) and NIR (850 nm CWL × 40 nm width) with a pixel count of 1248 horizontal/950 vertical. The green band is typically unused. The sensor has a total weight of 30 g and size of 25.4 × 33.8 × 37.3 mm.

Data collection: multispectral images and soil samples

The farm used for data collection is located at Sturgis, South Dakota, USA (\({44}^{\circ }\ 25'\ 27''N;\ {103}^{\circ }\ 22'\ 34''W\)). We created an autonomous UAV flight plan for minimal passes similar to a raster scan pattern using the coordinates of the four corners of the field (44.25.39 N, 103.22.60 W; 44.25.28 N, 103.22.60 W; 44.25.39 N, 103.23.16 W; 44.25.39 N, 103.23.16 W). We captured 865 multispectral images at each of the growth stages (V4, V8, and V12) and used Sentera image stitching software to mosaicking the images. The multispectral images were captured while the UAV was following the raster scan pattern using the following parameters and experimental setup,

  1. (i)

    Parameters:

    • Flight Type: QuickTile

    • Overlap Setting: 75%

    • Altitude: 60.96 m

    • Speed: 6.71 m/s

  2. (ii)

    Experimental setup:

    • Desired resolution: The Ground Sample Distance (GSD)/pixel of the multispectral camera was set to 0.05 m for a 60.96 m altitude.

    • Cloud cover and time of day: The UAV was flown when the sun was highest in the sky for more accurate data. Data is best when sky conditions are consistent, ideally 100% sunny or 100% cloudy. Flying with a mix of sun and clouds causes inconsistency in brightness and contrast while stitching images. Therefore, the stitched image will provide an inaccurate NDVI value.

Figure 1
figure 1

(a) Mavic 2 pro UAV with multispectral camera mounted. (b) The flags show the sample locations of the corresponding crops. The patches have crops including Peas, HRS Wheat, Millet, Soybean, Corn, and HRW Wheat, respectively.

We collected six soil samples 6.1 m from the edge of the field and six samples from the opposite side of the field, and six soil samples from the center of the field as shown in Fig. 1b. We followed the soil sampling methods for South Dakota region29 to select the sample locations and number of samples collected. A total of 54 soil samples were collected at an 0.2 m depth from six patches (3 samples per patch) at the V4, V8, and V12 stages (18 samples per stage) using a hydraulic probe. We avoided sampling from the areas where conditions were different from the rest of the field (e.g., former manure piles, fertilizer bands, or fence lines). Figure 1b shows the sample locations across the patches.

Calibration

LIBS utilizes a high energy pulsed laser which generates a high temperature ranging from \(10^{\circ }\)\(20{,}000^{\circ }\hbox {K}\) resulting in plasma formation when focused on a sample. This, in turn, leads to ablation of a minuscule amount of sample, leading to excitation of the sample’s constituent elements. As the plasma cools, these excited atoms and electrons emit photons which correspond to specific elements present in the sample. These photons are collected by a spectrometer and result in quantitative and qualitative analysis of samples. The SciAps Z-300 handheld LIBS analyzer was used for these measurements. This device has an extended spectrometer wavelength range from \(190\,\upmu \hbox {m}\) to \(950\, \upmu \hbox {m}\). The extended range allows emission lines from elements H, F, N, O, Br, Cl, Rb and, S to be measured. The LIBS instrument is equipped with a Q-switched Nd:YAG laser, 5-6 mJ per pulse at 1064 nm. Ten laser pulses are shot on the soil samples in the presence of Ar purge to obtain averaged data on each measurement. The focused laser on the soil surface forms a \(\mu m\) size of a sample into \(>10{,}000^{\circ }\hbox {K}\) plasma. The unique emission spectrum is collected by the spectrometer as the plasma cools.

Figure 2
figure 2

Emission lines of soil samples at the V4, V8, and V12 stages for six patches.

We used NIST LIBS database30 to determine the N lines from the emission spectrum (Fig. 2) and found N lines at 493.4 nm, 746.6 nm, 821.4 nm, and 868.1 nm (Fig. 3). However, we discarded the 746.6 nm N lines due to weaker intensity response and inconsistency between the samples in wavelength. We verified the N lines from the study of soil nutrient detection for precision agriculture22. From the soil samples, we select four samples randomly and obtained the actual TN of soil in ppm for calibration. We analyzed all the 54 soil samples in LIBS to determine the N spectrum’s maximum intensity at 493.4 nm, 821.4 nm, and 868.1 nm (Figs. 5, 6, and 7) at V4, V8, and V12 stages.

Figure 3
figure 3

Determining N lines from the soil sample using NIST database.

Using the correlation between actual TN and the maximum intensity of N spectrum, we construct calibration plots for 493.4 nm, 821.4 nm, and 868.1 nm through linear regression (Fig. 4). We use \(R^2\) as our calibration metric and find \(R^2=0.98\), \(R^2=0.99\), and \(R^2=0.90\), respectively, showing a strong correlation between the actual soil TN and the peak intensity of the N spectrum. Using the calibrated model, we converted the peak intensity of the N spectrum (Figs. 5, 6, and 7) to TN (ppm) for all the 54 soil samples (Table 1) to generate the training data for the ML models.

Figure 4
figure 4

Calibration plot for computing soil TN using the peak intensity of the nitrogen spectrum at 493.4 nm, 821.4 nm, and 868.1 nm.

Figure 5
figure 5

Nitrogen spectrum of the soil samples at 493.4 nm for six patches at the V4, V8 and V12 stages.

Figure 6
figure 6

Nitrogen spectrum of the soil samples at 821.4 nm for six patches at the V4, V8 and V12 stages.

Figure 7
figure 7

Nitrogen spectrum of the soil samples at 868.1 nm for six patches at the V4, V8 and V12 stages.

Feature extraction and dataset

The multispectral images are composed of three channels, channel-1: R, channel-2: G, and channel-3: NIR. The multispectral sensor’s datasheet31 shows that channel-1 contains both R and NIR light. Therefore, the NIR light needed to be removed to isolate R and compute NDVI. The equations for R and NIR light are,

$$\begin{aligned} R&= 1.0 * DN_{ch1} - 1.012 * DN_{ch3} \end{aligned}$$
(1)
$$\begin{aligned} NIR&= 9.605 * DN_{ch3} - 0.6182 * DN_{ch1} \end{aligned}$$
(2)

where \(DN_{ch1}\) is the Digital Number (pixel value) of channel one, and \(DN_{ch3}\) is the Digital Number (pixel value) of channel three. The coefficients of DN were provided in the datasheet31.

Figure 8
figure 8

Band separation, and computed NDVI pixels and zonal NDVI.

Using Eqs. (1) and (2), band separation (Fig. 8a) was performed to compute NDVI (Fig. 8b) and extract the pixel values from each of the bands. The dataset (Table 1) was created using the mean NDVI and the mean pixel values of each of the bands from individual zones at the V4, V8, and V12 stages. The equation for computing NDVI,

$$\begin{aligned} NDVI = \frac{1.236*DN_{ch3}-0.188*DN_{ch1}}{1.000*DN_{ch3}+0.044*DN_{ch1}} \end{aligned}$$
(3)
Table 1 Generated training data for the ML models.

Methods

In supervised learning, the goal is to obtain an optimal predictive model function \(f^*\) based on the input x and the output y to minimize the cost function L(f(x), y). In this study, we particularly use MLP-R and SVR which can be used for both classification and regression problems. We applied HPO techniques to determine the best set of hyper-parameters from the ML models and train the ML models using those hyper-parameters on the training dataset.

Multi-layer perceptron regression (MLP-R)

Mulit-layer perceptron is a supervised learning algorithm that learns a function \(f(.): R^x \rightarrow R^o\) by training on a dataset32, where x is the number of input dimension and o is the number of output dimension. We designed the MLP-R (Fig. 9a) with multiple organized layers consisting of various neuron-like processing units. Each node in the layer was connected with the nodes in the previous layer. Each node may have symmetrical or differing strengths and weights. The data in the network enters with the input layer and gradually runs through each layer to reach the output layer. For a given a set of features \(x =\){R, NIR, G, NDVI, Air temperature, RH} and target \(y =\) TN, \(f(.): R^6 \rightarrow R^1\). To train the MLP-R from a given set of input-output pairs \(X = \{(\vec {x}_1, y_1),\ldots ,(\vec {x}_N, y_N)\}\), learning consists of iteratively updating the values of weight and bias of the perceptron to minimize RMSE. The hyper-parameter configuration (Table 2) was created using the solver type33, activation function34, learning rate and hidden layer sizes.

Figure 9
figure 9

(a) MLP-R with four hidden network having different weights, where input layer \(\in {\mathbb {R}}^6\), hidden layer \(\in {\mathbb {R}}^4\), output layer \(\in {\mathbb {R}}^1\) and n1, n2, n3, and n4 represent the number of perceptron in each hidden layer, respectively. (b) HPO for GS, RS and GA with cross-validation, training the ML models with the tuned HP and prediction.

Support vector regression (SVR)

Support vector machine (SVM) makes data points linearly separable by mapping them from low-dimensional to high-dimensional space. The classification boundary creates a partition between the data points by generating a hyperplane35. SVM concepts can be applied to regression problems by generalizing them. SVR uses a symmetrical loss function that penalizes both high and low misestimates equally. The \(\varepsilon\)-tube is used to generalize SVM to SVR by adding an \(\varepsilon\)-insensitive region around the function, ignoring the absolute values of errors less than a certain threshold \(\varepsilon\) from both above and below the estimation36,37. In SVR, points outside the tube are penalized, but points inside the tube, whether above or below the function, are not penalized. SVR uses different types of kernels for non-linear functions to map the data into a higher dimensional space34,36,37. Linear kernels, radial basis function (RBF), polynomial kernels, and sigmoid kernels are common kernel types in SVR. We created the hyper-parameter configuration (Table 2) using the kernel types, regularization parameter (C)34, and distance error (epsilon) of the loss function34.

Hyper-parameter optimization (HPO)

GS, RS, and GA HPO techniques were executed within their respective hyper-parameters to train the model. We performed cross-validation by splitting the train and test data into 5-folds. After obtaining the RMSE from the cross-validation score, we selected the hyper-parameters which yielded the lowest RMSE. Finally, using the best set of hyper-parameters we trained the MLP-R and SVR models for each HPO technique. Figure 9b shows the step-by-step process of HPO, training the dataset, and prediction of test data.

Table 2 Specifics of the configuration space for the hyper-parameters.

GS exhaustively evaluates all the combinations in the hyper-parameter configuration space specified by the user in the form of a grid configuration38. The user must identify the global optimums manually since GS cannot utilize the well-performing regions28. However, in RS, the user defines a budget (i.e., time) as well as the upper and lower bounds of the hyper-parameter values. RS randomly selects the values from the pre-defined boundary and trains until the budget is exhausted28. If the configuration space is wide enough, RS can detect the global optima. Assuming a model has k parameters and each of them has n distinct values, the GS computational complexity increases exponentially at a rate of \(O(n^k)\)39. Therefore, the effectiveness of GS depends on the size of the hyper-parameter configuration space. For RS, the computational complexity is defined as O(n), where n is specified by the user before the optimization process starts28.

GA40 randomly initializes the population and chromosomes. Genes represents the entire search space, hyper-parameters, and hyper-parameter values. GA uses a fitness function to evaluate the performance of each individual in the current generation similarly to the objective function of a ML model. To produce a new generation, GA performs selection, crossover, and mutation operations on the chromosomes involving the next hyper-parameter configurations to be evaluated. The cycle continues until the algorithm reaches the global optimum.

Results and discussion

To evaluate the HPO methods, we implemented five fold cross-validation and used RMSE as the performance metric. Additionally, we measured CT as a model efficiency metric. CT is the total time required to complete an HPO process. We specify the same hyper-parameter configuration space (Table 2) for all HPO methods to fairly compare GS, RS and GA. The optimal hyper-parameter configuration (Table 3) was determined by each of the HPO methods based on the lowest RMSE for all three wavelengths.

Figure 10
figure 10

Performance comparison for different HPO algorithms at different wavelengths.

Table 3 Optimal hyper-parameter configuration selected by different HPO algorithms at different wavelengths.

We tuned the models on a machine with an 8 Core i7-9700K processor and 16 gigabytes (GB) of memory. We used Python 3.5, multiple open-source Python libraries, and open-source Python frameworks, including sklearn34. Figure 10 shows that for both MLP-R and SVR, RS produces much faster results than GS while maintaining lower RMSE for the same search space size. In general, GA offers lower RMSE for both models but has a higher CT compared to GS and RS in all three wavelengths. Overall, MLP-R outperforms SVR in terms of performance. However, we achieved better efficiency with SVR in our dataset.

We introduced the machine learning approach to estimate the TN of soil using NDVI and multispectral characteristics (R, NIR and G) of the images. We also consider the environmental factors such as air temperature and RH. The performance of MLP-R and SVR models were tested on a fixed configuration space for the hyper-parameters under various hyper-parameter optimization techniques at three different wavelengths (Table 4). For both MLP-R and SVR, the default HP configuration do not yield the lowest RMSE, this demonstrates the significance of utilizing HPO. From Table 4, the estimation error of predicting soil TN is lowest in GA compared to GS and RS for both MLP-R and SVR, where \(\mu\) is the mean and \(\sigma\) is the standard deviation. While training the models, we split our dataset into train and test for all three wavelengths individually, where we use 80% of the data for training and 20% for testing.

Table 4 Estimation error for predicting soil TN.

The UMS framework can be used to estimate the total nitrogen in soil. However, depending on the types of soil and crops, the model needs to be re-calibrated. More specifically, the actual TN of soil should be obtained from the subset of the samples to calibrate the N spectrum’s intensity after determining the N lines using LIBS. Furthermore, N lines that fall around the 500 nm region should be avoided in sea sand due to the interferences with Titanium (Ti) lines23.

Conclusions

In this paper, we have demonstrated the ability of a UAV-based multispectral sensing solution to estimate soil total nitrogen. Specifically, we implemented two machine learning models multilayer perceptron regression and support vector regression to predict soil total nitrogen using a suite of data classes including UAV-based imaging data in red, near infrared, and green spectral bands, normalized difference vegetation indices (computed using the multispectral images), air temperature, and relative humidity. We performed hyperparameter optimization methods to tune the models for prediction performance. Overall, our numerical studies confirm that our machine learning-based predictive models can estimate total nitrogen of the soil with a root mean square percent error (RMSPE) of 10.8%.