Estimation of leaf nutrition status in degraded vegetation based on field survey and hyperspectral data

Timely monitoring of global plant biogeochemical processes demands fast and highly accurate estimation of plant nutrition status, which is often estimated based on hyperspectral data. However, few such studies have been conducted on degraded vegetation. In this study, complete combinations of either original reflectance or first-order derivative spectra have been developed to quantify leaf nitrogen (N), phosphorus (P), and potassium (K) contents of tree, shrub, and grass species using hyperspectral datasets from light, moderate, and severely degraded vegetation sites in Helin County, China. Leaf N, P, and K contents were correlated to identify suitable combinations. The most effective combinations were those of reflectance difference (Dij), normalized differences (ND), first-order derivative (FD), and first-order derivative difference (FD(D)). Linear regression analysis was used to further optimize sensitive band-based combinations, which were compared with 43 frequently used empirical spectral indices. The proposed hyperspectral indices were shown to effectively quantify leaf N, P, and K content (R2 > 0.5, p < 0.05), confirming that hyperspectral data can be potentially used for fine scale monitoring of degraded vegetation.


Materials and Methods
Study area. The study was conducted in Helin County, Inner Mongolia, north China. Helin County is located in the northern agro-pastoral ecotone, composed of flat plains, hills, and mountains in almost equal proportion (Fig. 1). The average elevation of the county is 1176 meters above sea level and the total area is 3401 square kilometers. With a temperate climate alternated by obvious wet and dry seasons, Helin County has annual average temperature of 5.6 °C, averaging −12.8 °C in January and 22.1 °C in July. The average annual precipitation is 417 millimeters, averaging 30 millimeters in January and 103 millimeters in July. The average wind speeds are slightly higher in spring and winter than in the summer and fall seasons. The average relative humidity for the whole year does not show obvious seasonal changes. The semi-arid climate supports sandy vegetations, in which grass and shrubs are predominant.
Data was collected over 28 days under clear sky conditions between 10:00 and 14:00 local time in July and August in 2012 and 2013. This time is characterized as peak vegetation conditions in the area. Leaf N, P, and K contents and hyperspectral data were recorded for the three different treatments frm the middle leaf samples of eight plants in each six dominant plant species (Setaria viridis, Agropyron cristatum, Salsola collina, Caragana microphylia, Lespedeza davurica, Pinus sylvestnis var.mongolica). Hand-Held ASD portable FieldSpec 2 spectrometer (Analytical Spectral Devices Inc., USA) recorded hyperspectral data. The spectrometer's spectral range is 325 to 1075 nm, 1 nm bandwidth (www.asdi.com). A leaf clip attaching the fiber optics to the leaves provided reflectance values. Once the leaf reflectance measurements were complete, leaf samples were collected, oven-dried at 70 °C ± 5 °C for 72 h, and dry matter analyzed for N, P, and K. Leaf N was measured by the Kjeldahl method 33 . Leaf P content was measured using the phosphovanadate method 34 . Leaf K content was analyzed using atomic absorption spectroscopy 22 . Results were expressed as mg (N, P, K) g-1 leaf dry matter. 144 original reflectance and leaf nutrition measurement samples were collected (8 plants from 6 species at 3 intensities). 64-pair samples were Reflectances differentiates along degradation intensity. We first investigated the vegetation reflectance at various degradation intensities and estimated the spectral response differentiates. The spectral reflectances of the leaves of dominant plant species were measured at three degradation intensities: light, moderate and severe. We used detrended canonical correspondence analysis (DCCA) to study the spectral response differentiates. DCCA uses two matrixes: a matrix of response variables, which is denoted as Y and often contains the degree of vegetation degradation and a matrix of explanatory variables (e.g. reflectance at each band), which is denoted as X and used to explain the variation in Y, as in regression analysis. In DCCA with detrending by segments and Hill's scaling, the length of the longest axis provides an estimate of reflectances variation. The unconstrained ordination provides basic overview of the compositional gradients in the data. Canoco software for Windows 4.5 35 was used for DCCA. If DCCA demonstrated statistical discrepancy between degradation intensities, we then subsequently selected the best performing hyperspectral indices, which had the highest consistency across the three degradation intensities. The indices, which had the highest relation coefficients, were selected as the final best indices to predict leaf N, P, and K contents.

Development and validation of hyperspectral indices.
Due to noise effects in the raw data, the marginal ranges 325-350 nm and 1000-1075 nm were removed from each spectrum. Instead of discrimination analysis for selecting the optimum bands, we chose to concentrate on deriving the complete combination of spectral indices between all channels. The aim of spectral indices is to construct a mathematical combination of spectral band values for enhancing the information content in regard to the parameter under study. Most published indices 36 are expressed as reflectance or a first-order derivative at a given wavelength, wavelength difference (Dij), ratio (RR), normalized difference (ND), or inverse reflectance differences (ID). Ten common types of indices based on both original reflectance and derivative spectra were used: where R is reflectance, FD is first-order derivative spectra and the suffixes (i or j) are wavelength (nm). In the entire 350 to 1000 nm wavelength domain, these indices were evaluated by regression analysis with leaf N, P, and K contents. In order to determine leaf N, P, and K content, we concluded from visual evaluation that the relationships were linear. This allowed us to calculate the coefficient of determination (R 2 ), and the corresponding significance level (p), across the complete combinations (Eq. 1-10) on entire wavelength band from 350 nm to 1000 nm. The optimum combination to represent nutritional content is identified as that with the highest R 2 .
Only few bands remained after identification of R 2 . These few bands in optimum combinations were further filtered through stepwise linear regression analysis. This analysis can reduce the redundant collinear spectral variables to a few non-correlated latent variables, thus avoiding the potential overfitting typical in correlation analyses. The formula of the stepwise regression is: Where Y is explanatory variable (leaf nutrition content); β0 is regression constant; β1 is the partial regression coefficient of the independent variable X1 (one band); β2 is the partial regression coefficient of the independent variable X2; βk is the partial regression coefficient of the independent variable Xk; k is the number of independent variables. In order to evaluate our developed hyperspectral indices, we have derived 43 frequently used empirical indices from the published literature. We compared the performance of the empirical indices and the newly developed hyperspectral indices by comparing the R 2 values and their significance levels. The models with the largest R 2 and highest statistical significance would be regarded as the optimal model. The models were calculated and compared by SPSS 19.0 software.

Results
Reflectance response to degradation intensity. Cluster distributions of reflectances along degradation intensity are presented in Fig. 2. Differences tended to be more pronounced with greater degradation intensity (Fig. 2). T-tests of the bands at 350 nm optical and 1000 nm NIR regions both indicated significant differences between degradation intensities (p < 0.05), which may be partly explained by the enhancement of vertical leaf distribution in lightly degraded vegetation, which had higher leaf density and canopy cover than severely degraded vegetation 37 . In severely degraded vegetation, there was decreased absorption capacity in the visible and red edge regions of the spectrum alongside decreases in leaf nutrition status, which shifts the reflectance towards the blue end of the spectrum and away from the red in lightly degraded vegetation 38 . Consequently, a cluster of measured reflectance points from lightly degraded vegetation can be separated from those clusters associated with severely degraded vegetation.
Correlation curves of reflectance-based complete combinations with leaf N, P, and K contents. Figure 3 presents an indicative subset of the results of the R, Dij, RR, ND, ID, FD, FD(D), FD(RR), FD(ND), and FD(ID) relationships for leaf N, P, and K contents, respectively, showing various combinations of reflectance and plant nutrition content and their correlations. These can be a significant source of information to correlate the physiological parameters under study 36 which allows optimized selection of effective wavelengths and bandwidths. Considering the various combinations of Ri and Rj, the combinations with the largest number of significant correlation coefficients were Dij, FD(D), RR, and ND for leaf N content; Dij, FD(R), and FD(D) for leaf P content; and Dij, RR, ND, FD, and FD(D) for leaf K content. These were selected as potential combinations for further analysis.
The sensitive bands related to leaf N content were mostly found in the green, green-yellow edge, middle red, and NIR regions of the spectrum. Leaf P content had the fewest significant coefficients in the combinations of reflectance and its FD (Fig. 3). The curves for FD-P and FD(D)-P indicated sensitive bands in the green-yellow edge, red, and NIR regions. The sensitive bands for leaf K content were mainly located in the short wavelength bands, ranging from 360 nm to 450 nm and covering the violet, blue, cyan, green, and yellow regions of the spectrum. FD values did not show a significant relation with leaf nutrient contents.
Dij, ND, RR, FD(D), and FD(R) responded differently to leaf nutrition across wavelengths (Fig. 3). For example, the correlation coefficient between Dij and leaf N content was significant in one band, but non-significant in the neighboring wavebands. Some bands had significant coefficients among several combinations, where they were significant in the Dij, ND, and FD(D) curves. Based on this, the bands with the largest number of significant (2020) 10:4361 | https://doi.org/10.1038/s41598-020-61294-7 www.nature.com/scientificreports www.nature.com/scientificreports/ coefficients among all combinations were determined to be the most sensitive and indicative of plant leaf nutrition content. By counting the number of significant correlation coefficients to identify sensitive bands for each combination, we identified 22 bands for leaf N, P, and K content.

Development of new hyperspectral indices.
We used stepwise linear regression to distinguish the optimal combination among the 22 sensitive bands selected earlier to identify the best combinations for estimating leaf nutrition for all indices. Following this, previously established methods were applied to devise high accuracy regression equations for leaf N, P, and K content 39

Assessment of empirical hyperspectral indices. Forty-three empirical indices reported in previous
publications were selected for identification of optimized indices ( Table 2). Different degradation intensities demonstrated obvious variability in correlation coefficients. As reported in Peng et al. 39 , under light degradation, the values of Viopt, FD525-570, MSS-DVI, SDb, and SDr were significantly negatively correlated with measured contents. However, in severe degradation, the relations are significantly positive. Contrastingly, NVI and SDy demonstrated significantly positive relationships with light degradation and negative relations for severe degradation. Spectral indices for leaf nutrition contents perform differently according to degradation intensity.
We selected three spectral indices for leaf nutrition estimation (RES, DVI, and FD730-570) for their ease of use and accuracy. These indices have high (Table 2). A comparison of R 2 values between the optimized stepwise regression indices derived from the complete combinations (Table 1) and the empirical indices (Table 2) showed that the R 2 values of the proposed stepwise regression indices were significantly higher than the best performing empirical indices.
We then aimed to establish a suitable equation for each of the three selected indices from Table 2. Linear regression equations were constructed (Table 3) based on field measured leaf N, P, and K contents and the www.nature.com/scientificreports www.nature.com/scientificreports/ corresponding empirical indices RES, DVI, and FD730-570 for all three degradation intensities. These indicated that all three empirical indices can predict leaf nutrition content at a statistically significant level, with the exception of RES and DVI for leaf P content.
Validating selected empirical and newly developed hyperspectral indices. Referencing   www.nature.com/scientificreports www.nature.com/scientificreports/ plant leaf nutrition content on validation samples. Linear regression and correlation coefficients of the predicted values accurately reflected leaf nutrition contents in the field measurements at a statistically significant level (Fig. 4). Confirming previsouly published results, the R 2 of the empirical indices predictions was lower than the one calculated from the newly developed indices, both the new and empirical hypersspectral indices predicted leaf K content better leaf N and P content.

Discussion
A handheld spectrometer directly acquires detailed spectra located in the visible and near-infrared regions bands related to leaf N, P, and K 22,30 . Many empirical spectral indices have been developed based on correlation analysis of field survey and satellite remote sensing data, including the 43 indices used in this study. However, retrieving leaf nutrition information from satellite remote sensing data and space-based observations is challenging because of the influence of atmospheric effects. Vegetation characteristics and background reflectance may also confound the compound signals received by the remote sensors 40 . Therefore, leaf nutrition status can be better simulated by spectral indices based on narrow and sensitive bands which experience less atmospheric influence and background disturbance, for both multispectral satellites and hyperspectral spectrometers. Based on this hypothesis, we combined the reflectance and its first-order derivative value at every waveband and sensitive ranges acquired by the ASD spectrometer. Bands sensitive to leaf nutrition content were identified and the most suitable equations of combined narrative bands were selected and applied to predict leaf nutrition content. The results demonstrate that plant leaf N, P, and K contents can be better predicted by the newly developed models than by solely empirical spectral indices.
The utilization of FD values contributes to the high accuracy of the results. Obtaining FD spectra by the division of difference in reflectance between successive wavebands eliminates the overlapping spectral features and background noise 41 . FD is currently used to decompose a mixed spectrum and reduce the noise in the hyperspectral region 17 . Spectral indices based on FD are found to be highly sensitive to many of the physiological parameters of leaves, and are therefore strong predictors of leaf nutrition content 42,43 . However, few studies have examined the performance of FD spectra across the 400-900 nm wavelengths. Our study did so and produced a complete combination of FDs. The usage of FD improves the performance of our proposed hyperspectral indices.
Stepwise linear regression analysis of FD in our study may also greatly improve the estimation of leaf nutrition status, by avoiding potential overfitting problems when the number of variables is considerably fewer than the number of samples 44 . When the number of variables is limited, potential confounding factors are preferable to employing a simple index-based approach.
The best results were also attributed to the use of sensitive band identification in hyperspectral data. Although selecting sensitive bands is extremely important for increasing the accuracy of estimation, the method for carrying out this selection is a challenging issue. Wavelengths identified as most sensitive to N vary between studies. Zhao et al. (2005) found that leaf N was most responsive at 517 and 701 nm in cotton 9 , while Buscaglia and Varco (2012) identified a stronger relationship between cotton leaf N content and leaf or canopy reflectance in the green wavelengths instead of the red-edge or NIR region 14 . Using six different models, Yao et al. (2015) found that 690/695, 709/710, 700/705, 713/727, 1200, and 1335/1340 nm, located in the red-edge and near-infrared regions, were the sensitive wavelengths for N 17 .
However, for leaf N estimation in wheat, wavelengths of 384, 492, 695, 1339, and 508 nm and 681, 722, 960, 1264, and 1369 nm were found to perform best 17 . Among these bands, chlorophyll and carotenoids in green plants often strongly absorb the visible range 384, 492, and 508 nm; 681, 695, and 722 nm in the red range and can serves as sensitive N indicators; and the shortwave infrared range 960, 1264, 1339, and 1369 nm bands are indicators of proteins (where N is a main component) 39     www.nature.com/scientificreports www.nature.com/scientificreports/ Various factors can affect the accuracy of leaf N estimates. Tarpley et al. (2000) found that leaf N can be overestimated by indices constructed from green or yellow-orange wavelengths, potentially due such confounding factors as macro and micronutrient deficiencies 45   the canopy scale 47 , and that the bands suitable for estimating leaf N varied between different plant growth stages. Buscaglia and Varco (2012) found that the wavelength most sensitive to leaf N, and consequently best correlated with cotton leaf N content, was 612 nm at squaring and 728 nm at the flowering stage 14 .
The newly developed spectral models, based on sensitive bands and optimized by combination, have high determinant coefficients under various environmental conditions. The three degradation intensity environments included various conditions, which can affect the results. First, an increase in leaf N content from severely degraded vegetation to lightly degraded vegetation may induce a saturation effect. Furthermore, deficiencies in macro and micro-nutrients in severely degraded vegetation under stress conditions may induce an overestimation of leaf N content. Finally, different development stages among dominant plant species may induce a shift in the sensitive bands of leaf N content. Under these conditions and many possible disturbance factors, the newly developed spectral models are relatively steady and robust in their estimations of leaf N content.
The same process of hyperspectral data analysis was used to estimate leaf P and K content and also yielded the best results. The identification of narrower sensitive bands was achieved by comparing the correlation coefficients for combinations of indices. Stepwise linear regression analysis was then conducted on these sensitive bands for each of the three degradation intensities. This can increase the accuracy of leaf P and K estimation by considering environmental conditions across various species. The sensitive bands for P were determined to be 416, 421, 424, 427, 458, 485, 664, 819, 828, 839, 902, and 933 nm, in the visible green and NIR regions of the spectrum. The sensitive bands for K were found to be 457, 483, 646, 731, 835, 900, 916, and 919 nm, which lie in the green, red and NIR regions. These bands are located within ranges reported in previous studies 22,30 , and show higher correlation coefficients. Since these bands were extracted from six dominant species in temperate vegetation, they can be applied more generally.
The high accuracy of newly developed spectral models may be attributable to the deletion of the spectral water absorption region, which was done at the start of hyperspectral data processing. Water absorption mainly affects spectra above 790 nm 8 . We used FD and combinations of different sensitive bands to weaken such effects. The utilization of hyperspectral data with hundreds of bands may also help to increase the accuracy of leaf nutrition status estimates. Previous studies show no shared optimal three-band spectral index. Instead, a normalized difference spectral index can be utilized to estimate leaf N, P, or K content in different plant species 8,48 . Obviously, it is more precise to make estimates by selecting several sensitive bands from the hundreds available than to only use several fixed bands. Most satellite remote sensing data have only four to seven bands, which limits their ability to estimate the physiological parameters of leaf health.
The new spectral models were developed with general applicability in mind. First, the models extracted spectral information from six dominant species, including woody plants, shrubs, and grasses, representing wide spectral characteristics of various species. Second, we tested the accuracy of models across various degradation intensities, and only models performing with high consistency across various vegetation states were selected. Third, the complete combination of original reflectance and its first-order derivative values over 350-1075 nm, the wavelength mostly used by majority of spectrometers, have wide potential for application. With these considerations, these new models may help to monitor degraded vegetation. However, analysis must be mindful of the dominant species in vegetation across ecosystems when using our developed spectral models to estimate leaf nutrition contents, since different species demonstrate different spectral traits even when they have the same chemical contents. In addition, more advanced methods such as partial least squares regression, support vector regression, and random forest are increasingly used for analyzing hyperspectral data 49,50 , which can also predict leaf nutrition contents and are therefore strongly suggested in future study.

Conclusions
This study used completed combinations of sensitive bands and 43 empirical spectral indices from three datasets collected in-situ from lightly, moderately, and severely degraded vegetation in temperate Inner Mongolia, China to estimate leaf nutrition contents. Among empirical indices, RES, DVI, and FD730-570 performed best.
Stepwise linear regression on reflectance difference (Dij), normalized differences (ND), first-order derivative (FD), and first-order derivative difference (FD(D)) at sensitive bands were selected using Pearson correlation analysis under various community conditions and were found to be the most effective in predicting leaf nutrition contents (R 2 = 0.5-0.8, p < 0.05). These indices, extracted from narrow sensitive bands, were statistically significant and performed better than empirical indices. Therefore, they can be regarded as a global index which sufficiently represents nutritional content. This demonstrates great potential for the use of hyperspectral data in monitoring leaf nutrition status at a fine scale. These spectrally very narrow models can only be applied with very high spectral resolution of 1-3 nm spectrometer. However, curves of the correlation coefficient of determination can aid in locating indices tailored to other remote sensors. This new understanding may help to explore the potential for hyperspectral data in quantifying leaf nutrition content. In addition, it would be useful to test the proposed indices using image aerial and satellite hyperspectral data in future studies, to provide a set of indicators with wider generality.