Biomagnetic monitoring combined with support vector machine: a new opportunity for predicting particle-bound-heavy metals

Biomagnetic monitoring includes fast and simple methods to estimate airborne heavy metals. Leaves of Osmanthus fragrans Lour and Ligustrum lucidum Ait were collected simultaneously with PM10 from a mega-city of China during one year. Magnetic properties of leaves and metal concentrations in PM10 were analyzed. Metal concentrations were estimated using leaf magnetic properties and meteorological factors as input variables in support vector machine (SVM) models. The mean concentrations of many metals were highest in winter and lowest in summer. Hazard index for potentially toxic metals was 5.77, a level considered unsafe. The combined carcinogenic risk was higher than precautionary value (10−4). Ferrimagnetic minerals were dominant magnetic minerals in leaves. Principal component analysis indicated iron & steel industry and soil dust were the common sources for many metals and magnetic minerals on leaves. However, the poor simulation results obtained with multiple linear regression confirmed strong nonlinear relationships between metal concentrations and leaf magnetic properties. SVM models including leaf magnetic variables as inputs yielded better simulation results for all elements. Simulations were promising for Ti, Cd and Zn, whereas relatively poor for Ni. Our study demonstrates the feasibility of prediction of airborne heavy metals based on biomagnetic monitoring of tree leaves.

Atmospheric particulate matter (PM) pollution is a serious global environmental problem and it has been closely linked to a wide range of adverse health outcomes [1][2][3][4] . Besides the size and concentration of PM, there is growing evidence that its chemical composition, especially its trace metal components, is crucial to its toxicity [5][6][7] . The metal particles adsorbed to PM can be transported over large distances and deposited in soils and water bodies or on the leaves of plants via wet and dry deposition, thereby posing environmental risks 8,9 . Consequently, there is an urgent need to investigate the pollution levels caused by particle-bound heavy metals in urban areas.
Tree leaves have large surface areas and thus serve as efficient passive collectors of atmospheric dust 10 . In general, coarse PM is usually deposited on leaf surfaces whereas finer PM is trapped in leaf waxes 11 . With their wide distribution in urban areas and ease of sampling, tree leaves have been used to investigate the spatial and temporal patterns of atmospheric pollutants, including PM 12,13 , heavy metals 10,14,15 , polycyclic aromatic hydrocarbons 16,17 , and nitrogen oxides 18,19 . Geochemical methods for determinations of atmospheric heavy metals involve the collection and processing of PM filter samples. Alternative methods include atomic absorption spectrometry, inductively coupled plasma atomic emission spectroscopy and ICP mass spectrometry, but they are laborious and time-consuming and therefore not suited to large-scale pollution monitoring. However, the magnetic particles adsorbed on tree leaves are a good proxy to measure pollution caused atmospheric heavy metals. Several reports have focused on the relationships between leaf magnetic properties and metal concentrations in leaf samples 10,[20][21][22][23] or in deposited atmospheric dust 23,24 . However, their methods cannot be used to directly analyze actual pollution caused by particle-bound heavy metals in the atmosphere, because metal concentrations are normally measured using PM of a certain particle size and collected on pumped-air filters. By contrast, few studies have evaluated the statistical relationships between leaf magnetic properties and heavy metals trapped on pumped-air filters 14 .
The levels of atmospheric pollutants can be predicted using deterministic and statistical models. Deterministic models usually require information about pollution sources and emission quantity, as well as sufficient knowledge about the physical processes and chemical reactions among pollutants 25,26 . Statistical models are simpler and better-suited to identifying the dependencies underlying pollutant concentrations and their potential predictors; as such, they often have a higher accuracy 27,28 . Among the statistical approaches used to forecast atmospheric pollutant levels are multiple linear regression 29,30 , grey model 31 , clusterwise regression 32 , random forest partition model 33 , artificial neural networks 34,35 , support vector machine 36,37 and hybrid models 38,39 . Among those methods, the support vector machine (SVM) algorithm, which is based on the structural risk minimization principle, has increasingly been applied to solve non-linear regression problems, because it takes into account the error approximation in the data and generalization of the models. Previous studies used SVM approaches to predict a series of atmospheric pollutants, including NO 2  43 , based on emission information and meteorological data. However, the potential of statistical models combined with leaf magnetic properties to predict atmospheric heavy metals has yet to be fully explored.
In this study, we examined the relationship between leaf magnetic characteristics and heavy metal concentrations in atmospheric PM 10 from a large metropolitan city in China. Based on this relationship, we developed a method that uses leaf magnetic properties and meteorological factors as input variables in non-linear statistical models to accurately predict atmospheric particle-bound heavy metals.
Results and Discussion pM 10 concentrations. As shown in Fig. 1, approximately 88% of the daily PM 10 concentrations were higher than the 24-h guideline value of 50 μg/m 3 proposed by the World Health Organization (WHO), whereas <5% of the daily PM 10 concentrations were above the 24-h Chinese National Ambient Air Quality Standard (NAAQS) limit of 150 μg/m 3 . The annual mean PM 10 concentration in the study area was 84 μg/m 3 (range: 42-164 μg/m 3 ), which was slightly higher than the annual limit of 70 μg/m 3 set by the NAAQS and much higher than the annual guideline value of 20 µg/m 3 proposed by the WHO. The mean PM 10 concentration changed seasonally, decreasing in the order of 105 μg/m 3 (range: 58-164 μg/m 3 ) in winter, 97 μg/m 3 (range: 57-157 μg/m 3 ) in spring, 75 μg/m 3 in autumn (range: 42-133 μg/m 3 ), and 64 μg/m 3 in summer (range: 42-88 μg/m 3 ). The temporal trends of the meteorological parameters and atmospheric pollutants during the sampling period are shown in Supplementary Figs. S1 and S2. The concentrations of atmospheric pollutants including PM 10 , PM 2.5 , SO 2 , NO 2 and CO were higher in winter mainly because of the emissions of domestic heating systems and the unfavorable meteorological conditions, such as low wind speed and low temperature, which enhance the accumulation of air pollutants 44 . The lower concentrations of atmospheric pollutants during summer were related with the high temperature, abundant rain and the relatively strong diffusion capacity 45 .
concentrations of particle-bound-heavy metals. The mean concentrations of the particulate-bound elements are summarized in Table 1. In general, Fe and Zn were the most abundant heavy metals in PM 10 , whereas Co and Cd were present at lower concentrations. The mean concentrations of Cr, Cu, Fe, Mn, Pb and Zn were highest in winter, those of Cd, Ni and V were highest in autumn, and those of As, Co and Ti were highest in spring. The mean concentrations of most of the measured elements (except As and Co) were lowest in summer.
In assessments of heavy metal contaminations, the enrichment factor (EF), calculated by normalizing a tested element against a conservative reference element, is commonly used to distinguish between anthropogenic influences and natural background levels 46,47 . The calculation and classification of the EF as applied in this study can be found in the Supplementary Information. As shown in Fig. 2, there were no obvious seasonal difference in the EFs of the different elements. The mean EFs of Co, Fe, Mn, and V were <10, indicative of a minimal enrichment of these metals and their having originated mainly from crustal sources. Cr was moderately enriched (10 < EF < 100), whereas As, Cd, Cu, Ni, Pb and Zn (EF > 100) were anomalously enriched. Thus, all of these elements were likely to have derived from anthropogenic sources, such as steel smelting, fly ash from coal burning, vehicle emissions, waste incineration, and contaminated soil 48 . By contrast, the As concentration in 64.3% and 53.6% of the PM 10 samples exceeded the NAAQS limit of 6 ng/m 3 and 6.6 ng/m 3 , respectively. The Ni concentration in nearly all of the PM 10 samples was above the WHO limit of 25 ng/m 3 . The health risk caused by exposure to the analyzed airborne metals via inhalation is shown in Supplementary Table S1. Among the studied metals, Ni, Pb and As had higher EC values. The HQ values for the inhalation of As, Cd, Co, Cr, Mn and V were below the safe limit (1) for both children and adults. Ni had the highest HQ value (3.83) in these two subpopulations. The hazard index (HI) for these metals was 5.77, which was above the safe limit (1), indicating accumulative noncarcinogenic risks for adults and children. For carcinogens, the acceptable risk range is between 1 × 10 −6 and 1×10 −4 according to the US EPA's risk management policy. The carcinogenic risks of As, Cd, Co, Ni, and Pb inhaled from PM 10 were less than the precautionary value (10 −4 ), both for children and adults, but the carcinogenic risks of Cr for the two subpopulations were higher. The combined carcinogenic risk was 2.39 × 10 −4 for children and 9.55 × 10 −4 for adults. Both values were higher than the precautionary value. Thus, for every one million children and one million adults living in the local environment, approximately 3 children and 10 adults are at risk of developing cancer during their lifetime due to exposure to toxic metals via PM 10 inhalation.
Leaf magnetic properties. Both χ LF and SIRM generally reflect the quantities of magnetic, and especially ferromagnetic (e.g., magnetite) minerals in a sample, but they are also influenced by the presence of paramagnetic and diamagnetic minerals 49 . χ ARM is particularly sensitive to single-domain ferrimagnetic grains 50 . As shown in Table 2, in the leaves of both tree species χ LF and SIRM decreased seasonally, in the order of winter > spring > autumn > summer. However, there were differences in the seasonal pattern of χ ARM , which decreased in the order of winter > autumn > spring> summer for Osmanthus fragrans Lour, and in the order summer > winter > autumn > spring for Ligustrum lucidum Ait. As shown in Fig. 3, the SIRM and χLF values correlated linearly, suggesting that ferrimagnetic minerals were the dominant magnetic minerals in the leaf samples 48 . The lack of a significant correlation between χ LF and χ ARM (Fig. 3) indicated that single-domain grains did not dominate these ferrimagnetic minerals. The ratios of χ ARM to χ LF, χ ARM to SIRM, and SIRM to χ LF are used to estimate mineral magnetic grain-size variations, with increasing ratios indicating decreasing grain size 48  www.nature.com/scientificreports www.nature.com/scientificreports/ all three ratios were lowest in spring and higher in autumn and summer, which suggested leaf accumulation of a larger number of finer grains in the latter two seasons.
The SIRM values and the ratio of SIRM to χ LF were significantly higher in Ligustrum lucidum Ait than in Osmanthus fragrans Lour, whereas the ratios of χ ARM to χ LF and χ ARM to SIRM were significantly lower in Ligustrum lucidum Ait than in Osmanthus fragrans Lour. The annual mean values of χ LF and SIRM for Osmanthus fragrans Lour were 2.07 ± 1.19 × 10 −8 m 3 /kg and 274 ± 100 × 10 −6 Am 2 /kg, respectively, and for Ligustrum lucidum Ait 2.10 ± 0.93 × 10 −8 m 3 /kg and 333 ± 75.0 × 10 −6 Am 2 /kg, respectively (Table 3). According to a review of Hofman et al. 52 and a previous study by our group 14 , the SIRM values of the two tree species were in the middle range while the χ LF values were lower than published values. Several different factors determine leaf magnetic properties. Particle accumulation by leaves is influenced by species-specific characteristics of the trees, such as phenology, growth status, leaf area density and leaf characteristics, e.g., wax layer properties, surface roughness and trichomes presence 53 . However, sampling height, the leaf exposure period, PM source distance and strength, as well as meteorological conditions, e.g., wind, rain, drought and seasonal dynamics, also play important roles 52 . Further work is needed to reveal the underlying relationships among meteorological conditions, airborne heavy metals from various sources and leaf properties.
principal component analysis (pcA). The relationships among heavy metals, PM 10 , meteorological factors and magnetic parameters were analyzed in a PCA (Supplementary Tables S2 and S3). For the leaf magnetic parameters of Osmanthus fragrans Lour, five factors, accounting for 76.041% of the total variance, were obtained. The first factor, accounting for 35.272% of the total variance, was dominated by Cr, Cu, Fe, Mn, Pb, Ti, Zn, PM 10 , temperature, pressure, χ LF and SIRM, which indicated that the metal sources were the iron and steel industry and soil dust. Factor 2, accounting for 15.619% of the total variance, was dominated by Cd, Ni, V, wind speed, χ ARM , χ ARM /χ LF , χ ARM /SIRM, and SIRM/χ LF , indicating industrial activities that resulted in the release of magnetic    www.nature.com/scientificreports www.nature.com/scientificreports/ minerals of a certain grain size. Factor 3 explained 11.765% of the total variance and was dominated by As and Co. Arsenic is a typical element associated with coal combustion 54 , whereas Co, with EF < 10, is indicative of crustal source. Therefore, this factor may reflect mixed sources of coal combustion and natural process. Factor 4, accounting for 8.998% of the total variance, was dominated by relative humidity, whereas factor 5, dominated by Co and Cu and representing 4.386% of the total variance, indicated traffic activities and road dust as the major sources 55 .
Simulation results and implications. The linkage of atmospheric heavy metals and magnetic particles is based on the fact that heavy metals such as Zn, Cd, Pb and Cr can be incorporated into the particle structure during combustion processes and/or by subsequent surface adsorption 56,57 . Magnetic properties can thus act as an effective proxy for airborne heavy metals. The influence of leaf magnetic properties on heavy metal accumulation was examined in this study by predicting metal concentrations with and without leaf magnetic variables while including PM 10 concentrations and meteorological factors ( Table 3). The R, MAE and RMSE results are listed in Supplementary Tables S4-S7. The predicted vs. observed concentration and the residuals were plotted for Pb and are shown in Fig. 4.
When only the PM 10 concentration and meteorological factors served as SVM inputs, the training R value of all the studied elements was between 0.565 and 0.819, and the test R value between 0.528 and 0.816. The training R and test R values of Cu were the lowest (0.565 and 0.528, respectively), and those of Ti the highest (0.819 and 0.816, respectively). The training R and test R values of Co, Cr, Fe, Mn and Ni were between 0.6 and 0.7, and those of As, Cd, Pb, V and Zn between 0.7 and 0.8.
However, the simulation results of the stepwise MLR were not satisfactory, even when leaf magnetic variables were included as input variables (Supplementary Table S5). For Osmanthus fragrans Lour, the training R values of the metals were between 0.587 (Mn) and 0.780 (Pb); for the test R, the values were <0.6, except in the case of Zn with test R of 0.693. For Ligustrum lucidum Ait, the training R values of all of metals were between 0.494 (Mn) and 0.681 (Pb); for the test R, the values were <0.6, with the exceptions of Cd and Co. These results obtained using a linear approach demonstrated the strong nonlinear relationships between metal concentrations and the input variables, which is consistent with our previous findings 30,43 .
When The improvement in the models achieved by including leaf magnetic properties as inputs was quantified by calculating the improvement rates (IRs) of model IV and model V for the training R and test R of each metal. The IR was calculated as follows 43 : model IV/V m odel I m odel I As shown in Fig. 5, in the training stage, the IRs of Co, Cu and Mn were better whereas those of As, Ni and Pb were relatively poor, both for model IV and model V. In the test stage, the IR values of Cu, Co and Cr of model IV were higher, as were those of Cu, Co, Zn and Fe of model V (both models: >0.15). By contrast, the IR values of Ni and V of model IV and those of Mn, Ni, Cr and Pb of model V were lower (both models: <0.05). In general, Cu and Co, both of which were attributed in the PCA to traffic activities and road dust, improved by the inclusion of the leaf magnetic properties of Osmanthus fragrans Lour and Ligustrum lucidum Ait. In our previous report 43 , the IRs of Co, Cu, Fe, Mn and Zn were also better with the inclusion of magnetic variables of PM 2.5 when simulating the mass-related concentrations of heavy metals in PM 2.5 by using SVM models.
Model IV, in which the leaf magnetic properties of Osmanthus fragrans Lour were included as inputs, showed better simulation effects for As, Cd, Cr, Cu, Mn, Ni, Pb and Ti. For these metals, the training and test R values of this model were higher than those of model V (Ligustrum lucidum Ait), whereas the simulation of Co, Fe, V and Zn was better in model V. This result was mainly related to the morphological and physiological characteristics of the two tree species (e.g., leaf area density, wax layer properties, surface roughness) 58 but the slight differences in the ambient environments (e.g., soil dust, road traffic, buildings) were also likely to have played a role 59 .
In general, models IV and V performed best for Ti, Cd and Zn, as evidenced by training R and test R values > 0.8. For Ni, however, the performances of these models were relatively poor, based on training R and test R values < 0.7. Ti was identified as a crustal element, whereas Ni, with the highest noncarcinogenic health risk and originating from mixed industrial activities, had a relatively poor simulation and very little improvement afforded by the inclusion of leaf magnetic properties. This finding is consistent with previous reports of a more reliable linkage between heavy metal concentrations and magnetic parameters in environments with similar and/or "single" source contributions, whereas multiple sources of heterogeneous chemical and magnetic particle can complicate determinations of the relationships between atmospheric heavy metals and magnetic parameters 52,60 . Source-specific magnetic fingerprints and their associations with atmospheric heavy metals remain to be elucidated in further research.

conclusions
The linkage between heavy metals in PM 10 and the leaves of Osmanthus fragrans Lour and Ligustrum lucidum AitLigustrum lucidum Aitc were studied using SVM models. The annual mean PM 10 concentration was 84 μg/m 3 (range: 42-164 μg/m 3 ). The elements As, Cd, Cu, Ni, Pb and Zn were anomalously enriched and Cr was moderately enriched whereas Co, Fe, Mn, and V were mainly from crustal sources. Ni had the highest noncarcinogenic  www.nature.com/scientificreports www.nature.com/scientificreports/ risk, and Cr the highest carcinogenic risks. The combined noncarcinogenic and carcinogenic risks posed by inhalation exposure to airborne heavy metals were both above the safe limit or precautionary level. The χ LF and SIRM values of the leaves of both tree species decreased in the order of winter > spring > autumn > summer, and the χ ARM values in the order of winter > autumn > spring > summer. The dominant magnetic minerals in the leaf samples were ferrimagnetic minerals. PCA revealed that the heavy metals in PM 10 have common sources with the magnetic minerals in leaf samples.
A subset of PM 10 concentrations, meteorological factors and leaf magnetic properties were then used as input variables to simulate heavy metals concentrations. The poor simulation results obtained by MLR evidenced the nonlinear relationships between the airborne metal concentrations and the input variables. The inclusion of leaf magnetic variables improved the simulation results for all of the studied elements, with the largest improvements in Cu and Co and the lowest improvement in Ni. SVM models with leaf magnetic variables of the two tree species as inputs performed better for Ti, Cd and Zn but relatively poorly for Ni. Our study thus demonstrates that the concentrations of most airborne toxic heavy metals can be estimated using a simple and efficient biomagnetic diagnostic method.

Methods
Sampling. Nanjing (118°46′E, 32°03′N), the second largest city in the Yangtze River Delta region of China, is an important industrial production area and the main transportation hub in southeastern China. It has a north subtropical monsoon climate with a mean annual temperature of 16 °C and a mean annual precipitation of 1106 mm. PM 10 samples were collected on Whatman quartz microfibers using medium-volume PM samplers (model XY-2200, Qingdao Xuyu Environmental Co., Ltd., China) with a flow rate of 100 L/min from the Xianlin Campus of Nanjing University ( Supplementary Fig. S4), located in the northern suburbs of Nanjing and near the city's northern industrial districts. Continuous sampling of PM 10 lasted 32 h with 8 h (7:00 am-15:00 pm) per day was conducted, from December 4, 2015 to February 28, 2016 (winter), March 2 to May 28, 2016 (spring), June 2 to August 31, 2016 (summer) and September 4 to November 30, 2016 (autumn). To ensure the samples were representative, we avoided sampling during rainy or windy weather. A total of 84 PM 10 samples were collected. Meteorological data were recorded synchronously at an automatic air quality monitoring station located near the study site. Before and after sampling, the filters were conditioned for 48 h in a desiccator at 25 °C and 40% relative humidity, and then weighed to determine PM 10 mass.
Osmanthus fragrans Lour and Ligustrum lucidum Ait, two species of evergreen trees widely distributed in Nanjing, were selected for leaf sampling because the hairs on their leaf surfaces facilitate the adsorption of atmospheric particles. Leaf samples of the two trees were collected every 4 day during the same duration as PM sampling to keep consistence with the 32 h-sampled PM filters, from a site ~400 m away from the PM sampling site (Fig. S4). Specifically, each leaf sample was collected on the forth day of PM sampling to ensure the necessary accumulation of PM on leaves. The 4-day sampling duration both of PM 10 and tree leaf was chosen by considering the effects of meteorological conditions and the daily variation of PM 10 concentration from Nanjing. The distance between the two tree species was <100 m. For each tree species, two healthy trees next to each other were selected from which four leaves were collected from each one using ceramic scissors. The trees used for sampling in this study were all 3-to 4-year-old with a height of 2-3 m. The leaves were obtained from different sides of the tree and at a height from the ground of 1.5-2.0 m. The eight leaves were pooled to obtain one leaf sample, with 84 leaf samples collected in total for each tree species. The leaf samples were immediately placed in polyethylene bags and kept in a refrigerator at 4 °C. All leaf samples were totally dried in an oven at 55-60 °C before the analysis.
Magnetic measurements. Low frequency (0.875 kHz; χ LF ) was measured on about 2 g dried leaf sample using a KLY-3S kappa bridge (Agico, Czech Republic). Isothermal Remanent Magnetization (IRM) was induced in a field of 1000 mT (SIRM) using a Molspin pulse magnetizer. The anhysteretic remanent magnetization (ARM) was realized using a DTECH AF demagnetizer (Molspin, UK), delivering 0.04 mT of direct current (DC) and a peak alternating field of 100 mT. Measurements are then expressed as susceptibility of ARM (χ ARM ) by dividing the remanence by the DC bias field. For the purpose of quantitation, all magnetic properties of each leaf sample were normalized on mass-specific basis. www.nature.com/scientificreports www.nature.com/scientificreports/ Analysis of heavy metal concentrations. Metal elements were released from the PM 10 samples by digestion with a mixture of HNO 3 , HCl and HF. The concentrations of Fe and Zn were determined using inductively coupled plasma optical emission spectrometry (Perkin Elmer SCIEX, Optima 5300 DV, Norway). The concentrations of As, Cd, Co, Cr, Cu, Mn, Ni, Pb, Ti, V and Zn were determined by using inductively coupled plasma mass spectrometry (Perkin Elmer SCIEX, Elan 9000, Norway). Four blank filters were also digested and measured for metal concentrations simultaneously. Then the concentration of one element was corrected by subtracting its average concentration of blank filters. SRM 1649a (urban particulate matter) was used for quality assurance and control with the recovery of all the studied elements between 88% and 109%.
Health risk assessment. The carcinogenic and noncarcinogenic risks posed by potentially toxic metals through the direct inhalation of PM 10 were calculated using the human health risk assessment models of the US Environmental Protection Agency 61,62 . The models include exposure assessment and risk characterization. Sensitivity was determined for children and adults. The inhalation exposure concentration (EC), hazard quotient (HQ) of the noncarcinogenic risk, and the carcinogenic risk (CR) were calculated as described in the Supporting Information. The hazard index (HI) is equal to the sum of the HQ and was used to assess the overall potential of noncarcinogenic effects.
Simulation models. Although initially developed for classification problems 63 , SVM models have been extended to solve nonlinear regression estimations by the introduction of an ε-insensitive loss function. Detailed information on SVM theory is provided in several publications 36,37,64 and is thus described only briefly in the following: Firstly, a kernel function is used to map the input variables to a high-dimensional feature space, after which the SVM approximates a set of data with a linear function: where ∅ x ( ) i is the features of the input variables after their kernel transformation, and ω i and b are the coefficients estimated by minimizing the regularized risk function. After kernel transformation, the data are linearly separable in the new feature space. In this study, the Gaussian radial basis function kernel was applied: where γ is the parameter of the kernel, and xi and xj are two independent variables. The coefficients ω i and b are estimated by minimizing the regularized risk function: is the empirical error (risk), measured using the ε-insensitive loss function: , otherwise (5) i i where ε is a prescribed parameter called the regularized term and is defined as the approximation accuracy of the training data points. The loss function ignores errors when their value is less than that of ε. The term ω || || 1/2 2 is the regularization term, which serves as a measure of function flatness. The value of the regularized constant C determines the trade-off between empirical error and the regularization term. Finally, the dual problem of Eq. (4) is often resolved by the introduction of the Lagrange multiplier method: where α i and α ⁎ i are the introduced Lagrange multipliers. In this study, MATLAB R2013a and libsvm-3.21 were used to build the SVM models. The data were randomly partitioned into two sets: 80% for training and 20% for testing. The maximum and minimum concentrations of one target element observed during the sampling period were retained in the training set to develop a reliable model. A subset of PM 10 concentrations, meteorological factors (temperature, relative humidity, pressure, and wind speed), with or without leaf magnetic properties (χ LF , χ ARM , SIRM, χ ARM /χ LF , χ ARM /SIRM and SIRM/χ LF ), were used as the input variables. Among the successful models, the best model was selected based on the higher correlation coefficient of the observed versus predicted output and on fewer errors in the training and test stages.
Models using a stepwise multiple linear regression (MLR) were also established to simulate metal concentrations with the same independent variables used in the SVM models, by applying SPSS 23.0. As shown in Table 3, five models were developed according to the statistical methods and input variables. www.nature.com/scientificreports www.nature.com/scientificreports/ error (MAE) and root mean squared error (MSE), which provide a global estimate of the difference between the observed and predicted outputs, were used to measure residual errors. In general, a higher R combined with a lower MAE and RMSE was considered to indicate better modeling of a metal element. R, MAE and RMSE were calculated as described in the Supplementary Information.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.