Feasibility of Using Rice Leaves Hyperspectral Data to Estimate CaCl2-extractable Concentrations of Heavy Metals in Agricultural Soil

Heavy metals contamination is a serious problem of China. It is necessary to estimate bioavailability concentrations of heavy metals in agricultural soil for keeping the food security and human health. This study aimed to use hyperspectral data of rice (Oryza sativa) leaves as an indicator to retrieve the CaCl2-extractable concentrations of heavy metals in agricultural soil. Twenty-one rice samples, soil samples and reflectance spectra of rice leaves were collected, respectively. The potential relations between hyperspectral data and CaCl2-extractable heavy metals (E-HM) were explored. The partial least-squares regression (PLSR) method with leave-one-out cross-validation has been used to predict concentrations of CaCl2-extractable cadmium (E-Cd) and concentrations of CaCl2-extractable lead (E-Pb) in farmland soil. The results showed that the concentrations of E-Cd in soil had significant correlation with concentrations of Cd in rice leaves; the number of bands associated with E-Cd was more than that of E-Pb. Four indices (normalized difference vegetation index (NDVI), carotenoid reflectance index (CRI), photochemical reflectance index 2 (PRI2), normalized pigments chlorophyll ratio index (NPCI)) were significant (P < 0.05) and negatively related to the E-Cd concentrations. The PLSR model of E-Cd concentrations performed better than the PLSR model of E-Pb concentrations, which with R2 = 0.592 and RMSE = 0.046. We conclude that if the rice was sensitive to E-HM and/or the crop was stressed by the E-HM, the hyperspectral data of field rice leaves hold potentials in estimating concentration of E-HM in farmland soil. Therefore, this method provides a new insight to monitoring the E-HM content in agricultural soil.

that a close relationship was found between the Cd concentration in vegetables and its concentration in the CaCl 2 -extract 19 . Anjos et al.(2012) use five extractant solutions evaluate the available fraction of aluminium (Al), Pb, manganese (Mn) and zinc (Zn) in the Pb mine, and found that CaCl 2 seems to be a good extractant medium 20 .
However, traditional CaCl 2 -extract methods is time-consuming and expensive 21 . And it is highly challenging to use the field sampling and wet chemistry methods for regular monitoring of heavy metal uptake at large scales. Compared with most chemical analysis, remote sensing technology has the advantages of simple, time-saving and labor-saving in soil monitoring 22 , especially the emergence of hyperspectral remote sensing technology makes it possible to monitor soil minerals, water, nutrients, salinity and other elements. With the continuous sampling and the high spectral resolution (<5 nm), hyperspectral sensors can discriminate critical spectral differentials in detail 23 . Some researchers have applied hyperspectral reflectance to detect the heavy metal in soil 8,24,25 . However, in the present study, some heavy metals are spectrally featureless in the visible and near-infrared parts of the electromagnetic spectrum 26 .
Compared with straightforward to derive heavy metal concentrations in soil, indirect access to soil heavy metal concentrations by plants is more practical. When plants are stressed, the biochemical contents (e.g., chlorophyll) of their leaves may change, and the spectral properties (reflectance and transmittance) at specific wavelengths (e.g., red, green, blue and red edge bands) will change with the biochemical contents of plants leaves 27 . Therefore, plants can be used as bridges to detect the elements in the soil using hyperspectral remote sensing techniques. Hyperspectral remote sensing has been used to detect stress in plants before visible symptoms have been observed [28][29][30] , such as water deficiency 31 , metal accumulation 32 , diseases 33 and salt 34 . Compared with monitoring stress in plants, use plant as an indicator to estimating CaCl 2 -extractable concentrations of heavy metals in agricultural soil by use the remote or proximal sensing is less studied and rarely reported in the literature according to our reviews.
The main objective of this study was to evaluate the effectiveness of using spectral reflectance at leaf scales to quantify the heavy metal concentrations in agricultural soil in Zhangjiagang, Suzhou, China. The aims of our study were: (1) to analysis the relationship between heavy metals concentrations in rice leaves and CaCl 2 -extractable heavy metals (E-HM) concentrations in soil; (2) to determine the optimum variables that provide great sensitivity to E-HM concentrations; and (3) to establish the PLSR model for estimating E-HM concentrations in agricultural soil using optimum sensitive variables of hyperspectral data of rice leaves.

Materials and Methods
Description of study area. Located on the eastern of the Yangtze River Delta (31°43′-32°02′N, 120°21′-120°52′E), Zhangjiagang city is approximately 999 km 2 , of which 799 km 2 are terrestrial areas (Fig. 1). The average annual temperature is 17.3 °C and the average annual precipitation is 1556.5 mm 35 . The soil type is mainly fluvo-aquic soil and paddy soil 36 . Because of the developed chemical industry, metallurgy, electroplating industry, printing and dyeing papermaking, et al., the Zhangjiagang city is one of the fastest growing cities in the Yangtze River Delta. www.nature.com/scientificreports www.nature.com/scientificreports/ field sampling and hyperspectral measurement. A total of 21 sampling sites ( Fig. 1) were set during September, 2017 in agricultural areas. At each sampling site, hyperspectral reflectance of the rice leaves, samples of rice and their root-soil (0-20 cm depth) were taken. The rice samples and soil were packed into polyethylene bags. Five random samples on each site were taken and bulked together as one composite sample. The location of each sampling site was acquired using a Global Positioning System (GPS, UniStrong G120) with an accuracy of about 3 m.
The hyperspectral reflectance of the rice leaves was obtained using a field portable spectrometer (UniSpec, PP systems, Haverhill, MA, USA). Spectral range and the spectral resolution of the sensor were 310-1100 nm and 3.3 nm, respectively. A bifurcated fiber optic cable and a leaf clip (models UNI410 and UNI501, PP Systems, Haverhill, MA, USA) were used to measure leaf reflectance of rice. The leaf clip held the fiber at a 60° angle to the leaf surface. Leaf illumination was provided by a halogen lamp in the spectrometer through one side of the bifurcated fiber. To minimize the measurement noise of the reflectance spectral, three spectral measurements of the fully-expanded leaves near top of each bundle were made and the 15 results averaged as one spectral measurement for the sampling site. A barium sulfate panel was used as a white reference standard to calibrating and optimizing the spectral before each measurement. Laboratory analysis. Soil heavy metals concentrations measurements. Soil samples were air-dried at room temperature (26-28 °C), and then sieved through a 2-mm nylon mesh to remove stones or other debris. Total concentrations of Cd and Pb in the soil were determined as following steps: 0.2 g soil was digested with 10 ml mixed solution of HNO 3 , HClO 4 and HF (1:1:2, v/v/v) in a polytetrafluoroethylene digestion tank, microwave digestion for 15 minutes (according to different sample conditions, the proportion of acid and digestion time can be adjusted), the final solution was diluted to 50 ml using deionized distilled water and analyzed with inductively coupled plasma atomic mass spectrometry (ICP-MS, X2, Thermo Electron Corporation) after digestion by a mixture of concentrated 37,38 .
CaCl 2 -extractable concentrations of Cd and Pb were determined as following steps: a 25 ml aliquot of 0.01 M CaCl 2 solution was added into a 5 g soil (<2 mm) sample in a 100 ml conical flask and the suspension was shaken at 250 rpm at 25 °C. After 12 h of shaking, the supernatant was separated from the solid phase by centrifugation at 3000 rpm for 20 min. The concentrations of Pb and Cd in the supernatant were analyzed with ICP-MS 18 .
Rice samples were thoroughly washed in deionized water, oven-dried at 70 °C until constant weight. For analyzing Cd and Pb concentration in rice leaves, 0.2 g sample was digested with 5 ml mixed solution of 5:2 HNO 3 : H 2 O 2 (v/v) in centrifuge tubes at room temperature. Then this solution was heated in a microwave accelerated reaction system (Anton-Paar PE Multiwave 3000) for 20 min. The digested substrate was then diluted with 43 ml deionized water and analyzed for total Cd and Pb with ICP-MS.
Hyperspectral data pretreatment. The original hyperspectral signal is susceptible to the environment, so original spectra data were preprocessed to enhance the spectral features and to acquire more information about heavy metals in the soil. Wavelengths shorter than 420 nm and longer than 980 nm were not analyzed due to excessive noises 39 , thus a total of 560 wavelengths were used as the raw spectral reflectance and were automatically interpolated from 3.3 nm to 1 nm in calibration 23 . This process was done using Excel 2007 (Microsoft Inc.).
Derivative transformation can remove the interferences of background, resolve overlapping spectra, and minimize the baseline drift of raw spectra that is caused by differences in grinding and optical setups 40 . The first derivative transformation and second derivative transformation were done using OriginPro 8 software.
Spectral indices calculated. Ten commonly used spectral indices were calculated ( Table 1). As shown in Table 1, except for water index (WI), all other spectral indices are related to chlorophyll or pigment.

Variables selected and partial least-squares regression model built. Correlation analysis of E-HM
concentrations with the raw spectral reflectance (R), first-order differential of R (R′), second-order differential of R (R′′) and spectral indices respectively were performed in SPSS (IBM SPSS Statistics 22) using bivariate related analysis. Variables with a significant correlation (P < 0.05) were selected for use in the model.
The partial least-squares regression (PLSR) with leave-one-out cross-validation was used to predict E-HM concentrations in farmland soil using selected variables and spectral indices. PLSR is one of the most frequently used methods for the estimation of soil heavy metal concentrations with visible and near-infrared reflectance spectroscopy (VNIRS) [40][41][42] . It can process data with strong collinearity and noise, and is well suited for situations where the number of variables considerably exceeds the number of available samples 43,44 . The PLSR and cross-validation were performed in TQ Analyst (8.3.125, Thermo Fisher Scientific Inc.). The performances of PLSR were assessed with two evaluation parameters between the measured values and predicted values: the coefficient of determination (R 2 ) and root mean square error (RMSE). The R 2 and the RMSE are commonly calculated using the following formulas 45 :

Results and Discussion
Heavy metal concentrations in agricultural soil. Descriptive statistics of concentration of Cd, Pb were reported in Table 2. It illustrated that the average concentrations of Pb (29.193 mg kg −1 ) was below the limit (80 mg kg −1 ) set by Ministry of Ecology and Environment of the People's Republic of China (MEEPRC) 46 , while the average concentration of Cd (0.301 mg kg −1 ) may be affected by human activities was slightly higher than the limit (0.3 mg kg −1 ) set by MEEPRC. In addition, the concentration of Cd in four samples (4 out of 21) exceeded the limit set by MEEPRC. Also, the mean concentration of Pb was bigger than the mean concentration of Cd, but the relationship between the mean concentration of E-Pb and E-Cd was reverse. That was because Cd is more available than Pb in soil 47,48 .
The mean SD and CV of E-Cd concentrations and E-Pb concentrations were also shown in Table 2. The CV of E-Cd concentrations and E-Pb concentrations were different from it of Cd concentrations and Pb concentrations. Forevermore, the soil with high concentrations of Cd and Pb may not have high concentrations of E-Cd and E-Pb. That may because that the E-HM concentrations in natural soils depends on differences soil environment, such as pH, concentrations of clay, sand and organic matter 9 .

Relationship between heavy metals concentrations in soil and those in rice leaves. The
Pearson's correlation coefficients between heavy metals concentrations in soil and in rice leaves are shown in Table 3. Only the significance of Pearson's correlation coefficients between E-Cd in soil and Cd in rice leaves was 0.649, which reached to the level of 0.05; While the significance of Pearson's correlation coefficients between Pb in soil and Pb in rice leaves were 0.340 for concentration of E-Pb and 0.222 for total concentration of Pb. Compared with concentrations of E-Pb, the concentrations of E-Cd had higher correlation with Cd concentrations rice leaves relatively in our study. Earlier studies found that, at common soil pH range, the stability of Cd is lower than that of Pb 49,50 . Meanwhile, rice tends to accumulate Cd, and the accumulation of Cd in rice is often controlled to greater extent by its bioavailability than its total content in the soil 3 . Therefore, the concentrations of Cd in rice leaves had higher correlation with E-Cd.

Relationship of e-HM concentrations against hyperspectral data. The Pearson's correlation coeffi-
cients of the E-HM concentrations and their processed reflectance (R, R′ and R′′) are shown in Fig. 2 and Table 4, and the Pearson's correlation coefficients between the E-HM concentrations and spectral indices are summarized in Table 5. The wavelengths with significant at P < 0.05 indicate that these bands are sensitive to E-HM.

Spectral indices name
Abbreviation Formulation Reference   Table 4, we could see that the maximum positive correlation waves and the minimum negative correlation waves between E-Cd and E-Pb were different. As shown in Fig. 2a, the number of bands associated with E-Cd gradually decreases as the processing progresses. There were 277 bands (in the range of 420-696 nm) of R, 68 bands of R′ and 37 bands of R′′ had significant relationship (P < 0.05) with E-Cd concentrations in soil. The correlation bands of R were continuous, while the correlation bands of R′ and R′′ were dispersed. In some literature correlation, the similar relationship between heavy metals and spectral data were shown 51,52 . This indicated that heave metal stress leads to spectral response, but redundant information was contained among the very close spectral bands 53,54 . Pre-processing techniques could remove redundancy information and made some subtle information clear in the spectral in order to improve the subsequent multivariate regression 55 .
While in Fig. 2b, we could see that the trend of the relationships between spectral and E-Pb concentrations was similar to that between spectral and E-Cd concentrations, but whether it in R, R′ or in R′′ correlograms, there correlations coefficient were not reach the 0.01 significance level (Table 4). That may be due to the low concentrations of E-Pb in the agricultural soil, which has not caused obvious stress on rice and has no obvious effect on the leaf spectra.
As shown in Table 5, spectral indices showed a wide range of correlations with the concentrations of E-Cd (−0.705-0.222) and E-Pb (−0.35-0.259). All spectral indices were negatively related to E-Cd concentrations,  Table 3. The Pearson's correlation coefficients between heavy metals concentrations in the soil and the heavy mental concentrations in rice leaves. * means at the 0.05 significance level.

Figure 2.
Correlations between processed reflectance (R-raw reflectance; R′-the 1st derivative spectra; R′′-the second derivative spectra) spectra and E-Cd (a) and E-Pb (b) concentrations in soil from Zhangjiagang city.  Table 4. Correlation analysis between E-HM concentrations and transformations of spectra. **means at the 0.01 significance level, *means at the 0.05 significance level.
four of them (NDVI, CRI, PRI2 and NPCI) had significant correlation (P < 0.05) with E-Cd content, while none index had significant correlation with E-Pb concentrations. The four spectral indices associated with E-Cd concentrations were leaf pigment-related indices. The index related to leaves water content (WI) had no significant correlation with E-Cd concentrations. Because Cd can damage the structure of chloroplasts, as manifested by the disturbed shape and the dilation of the thylakoid membranes 56 , so the indices associated with leaf pigment were more susceptible to Cd. However, rice water content is resistant to Cd when the mass fraction of Cd in 2.0-3.0 mg/kg in farmland soil 57 . According to Table 2, the mean content of Cd in agricultural soil was 0.3 mg/ kg, which in the region of the resistant. Therefore, the spectral index related to water content had no significant associate with E-Cd concentrations.
Model development and validation. We selected 386 and 209 variables for the model of E-Cd concentrations and E-Pb concentrations respectively, and the number of the samples was 21, meanwhile, most of the selected variables have strong collinearity, so the PLSR models was very suitable for this study. The relationship between measured concentrations of E-HM and predicted concentrations of E-HM were presented in Fig. 3. A proper model should have low RMSE and R 2 should be close to 1 7 . It was clear that the PLSR model had the capacity to predict E-Cd content, due to its higher coefficients of determination (R 2 = 0.592) and low RMSE (0.046) (Fig. 3a). While, the prediction of the PLSR for E-Pb concentrations with the RMSE value was 0.019 and R 2 only achieved 0.013, did not show good (Fig. 3b). It is known from the literature that Cd is the best-known toxic heavy metal and it is taken up by the calcium uptake system in plants 58 , while the soil has a higher binding capacity for Pb than for Cd 59 , making Pb less bioavailable. And from Table 2 we also knew that the ratio of the E-Cd concentrations in the total Cd (E-Cd mean /Cd mean = 0.051/0.301 = 0.17) was higher than the ratio of E-Pb concentrations in the total Pb (E-Pb mean /Pb mean = 0.01/29.193 = 0.0001), so rice may stressed by Cd not by Pb. The accurate of the PLSR model of E-Cd concentrations was not very high, that may contribute to only 21 sampling points were used for model development and validation, which impact on the robustness of the models.
In summary, it was demonstrated that, if the rice was sensitive to E-HM or it was stressed by a certain concentration of E-HM, the PLSR model based on pretreatment reflectance from hyperspectral data of rice leaves had the capability to estimate E-HM concentrations.

conclusions
In present study, the concentration of Cd in 19.05% of samples points exceeded the limit set by MEEPRC in agricultural soil of Zhangjiagang city, and the concentration of E-Cd in soil had significant correlation with concentration of Cd in rice leaves. However, due to the low concentration and the low bioavailability of Pb, the concentration of E-Pb in soil had no significant correlation with concentration of Pb in rice leaves.
The raw reflectance had redundant information, and pre-processing techniques could remove redundancy information and made some subtle information clear in the spectral. So the number of bands associated with E-HM gradually decreases as the processing progresses (R > R′ > R′′). The number of bands associated with E-Cd  Table 5. The Pearson's correlation coefficients between the E-HM concentrations and spectral indices. **means at the 0.01 significance level, *means at the 0.05 significance level. www.nature.com/scientificreports www.nature.com/scientificreports/ was more than that of E-Pb; the correlation between E-Cd concentrations and spectral data was higher than that between E-Pb concentrations and spectral data. Meanwhile, because of the low concentration of the E-Pb and the Cd resistant of rice water content, there were four indices (NDVI, CRI, PRI2 and NPCI), which related to chlorophyll or pigment were significant correlated with E-Cd concentrations.
The PLSR model had the capacity to estimate E-Cd concentrations in agricultural soil, but cannot estimate E-Pb concentrations in agricultural soil because of the low concentration of E-Pb. So, if the crop was sensitive to E-HM or the crop was stressed by the E-HM, the PLSR model had the capacity to estimate E-HM concentrations in soil.
Using hyperspectral data to evaluate E-HM content in agricultural soil is not affected by soil chemical properties (such as soil pH, organic matter content and soil texture), which can directly reflect the toxicity of heavy metals in soil and has a wider range of applications and a more accurate result compared with the total heavy metals concentration assessment method. This method may provide a new insight to monitoring the E-HM content in agricultural soil. However, the number of samples was too low to use an external validation, so more samples will be collected in the future to improve the predictive performances, and more heavy metals will be estimated to test robustness of the model.