Soil type recognition as improved by genetic algorithm-based variable selection using near infrared spectroscopy and partial least squares discriminant analysis

Xie, Hongtu; Zhao, Jinsong; Wang, Qiubing; Sui, Yueyu; Wang, Jingkuan; Yang, Xueming; Zhang, Xudong; Liang, Chao

doi:10.1038/srep10930

Download PDF

Article
Open access
Published: 18 June 2015

Soil type recognition as improved by genetic algorithm-based variable selection using near infrared spectroscopy and partial least squares discriminant analysis

Hongtu Xie^1,2^na1,
Jinsong Zhao³^na1,
Qiubing Wang⁴^na1,
Yueyu Sui⁵^na1,
Jingkuan Wang⁴^na1,
Xueming Yang⁶^na1,
Xudong Zhang^1,7^na1 &
…
Chao Liang^1,8^na1

Scientific Reports volume 5, Article number: 10930 (2015) Cite this article

3761 Accesses
22 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Soil types have traditionally been determined by soil physical and chemical properties, diagnostic horizons and pedogenic processes based on a given classification system. This is a laborious and time consuming process. Near infrared (NIR) spectroscopy can comprehensively characterize soil properties and may provide a viable alternative method for soil type recognition. Here, we presented a partial least squares discriminant analysis (PLSDA) method based on the NIR spectra for the accurate recognition of the types of 230 soil samples collected from farmland topsoils (0–10 cm), representing 5 different soil classes (Albic Luvisols, Haplic Luvisols, Chernozems, Eutric Cambisols and Phaeozems) in northeast China. We found that the PLSDA had an internal validation accuracy of 89% and external validation accuracy of 83% on average, while variable selection with the genetic algorithm (GA and GA-PLSDA) improved this to 92% and 93%. Our results indicate that the GA variable selection technique can significantly improve the accuracy rate of soil type recognition using NIR spectroscopy, suggesting that the proposed methodology is a promising alternative for recognizing soil types using NIR spectroscopy.

Global energy use and carbon emissions from irrigated agriculture

Article Open access 10 April 2024

Hotspots of biogeochemical activity linked to aridity and plant traits across global drylands

Article 12 April 2024

Towards establishing a fungal economics spectrum in soil saprobic fungi

Article Open access 18 April 2024

Introduction

Soil is a heterogeneous mixture of minerals and organic matter created by long-term pedogenic processes, which result in a variety of distinct types with differing properties and qualities. A soil’s type determines key physical and chemical properties and provides vital context for agricultural and ecological processes involving that soil. Soil type has traditionally been determined through a time-consuming combination of field investigations and laboratory analyses, both of which require specialized well-trained expertise. One emerging alternative to these approaches is the use of infrared spectroscopy. Infrared spectroscopy is a technique that uses the infrared spectrum of adsorption, emission, photoconductivity of a material; this technique has been successfully applied in various fields, including investigation of agricultural products, foodstuffs and pharmaceutical products¹. In soil science, infrared spectroscopy is an established technique for analyzing soil samples in a fast, cost-effective, non-destructive manner that does not require hazardous chemicals^2,3,4. Features within the infrared spectrum can be empirically associated with various soil properties, such as soil organic matter, texture, clay mineralogy, nutrient availability, fertility, structure and microbial activity^2,5,6,7,8,9.

While infrared spectroscopy is not a new technique, its utility has been greatly improved by recent advances in mathematical and statistical methods for extracting information from multivariate and spectral data¹⁰. Pattern recognition methods were developed to explore similarity among groups of multivariate data¹¹ and have been employed to identify soil samples using mid-infrared photoacoustic spectroscopy (MIR-PAS)⁴ and near-infrared spectroscopy (NIR)³. Chemometrics methods used in identification and classification of soils include principal component analysis (PCA)^4,12,13, partial least squares (PLS) and artificial neural networks (ANN)¹⁴, cluster analysis¹⁵, linear discriminant analysis (LDA)¹⁶, soft independent modeling of class analogy (SIMCA)¹⁶ and partial least squares discriminant analysis (PLSDA)¹⁷. Among these methods, PLSDA is particularly effective at classification tasks^18,19 and has the capacity to deal with data multicollinearity²⁰.

Analytical methods for spectral data must frequently contend with redundant or uninformative features. Variable selection attempts to identify the most informative subset of a large number of variables with the aim of minimizing errors and excluding unreliable or noisy data^21,22. In addition, variable selection reduces model complexity, simplifying interpretation^22,23. Well-performed variable selection can improve model quality during calibration and enhance its resulting predictive performance^21,24. Among the numerous variable selection methods available for chemometric analysis, genetic algorithms (GA) have demonstrated superior efficiency in systems as diverse as biofuel composition analysis²⁵, olive oil classification²⁶ and detection of insect infestation²⁷. GAs mimic natural biological evolution to stochastically sample “populations” of variables drawn from a larger variable pool, with the aim of selecting a combination of variables that performs best under some specified criterion^28,29. GAs can be used to select subsets of features from spectral information, potentially enhancing the performance of subsequent classification methods³⁰. PLSDA models may be built from variables selected by GA (GA-PLSDA). Despite the potential benefits of this approach, we are unaware of any previous studies that have applied the GA-PLSDA method to soil classification.

In this study, we combine NIR spectroscopy data with PLSDA to categorize five soil types sampled from farmland in northeast China and explore the effect of GA variable selection on prediction accuracy. The objectives of this study were: 1) to illustrate a fast and accurate methodology for recognizing soils using NIR spectroscopy; and 2) to explore the effect of GA variable selection on PLSDA model calibration and performance.

Results and discussion

The five soil types we studied differed in their texture and soil organic carbon (SOC) content (Fig. 1). The Albic Luvisols were characterized by high SOC and silt contents and a low sand content. Eutric Cambisols had the highest sand and lowest clay contents, while Phaeozems had the highest clay content and relatively high SOC and silt contents. There were general differences in NIR spectra among the soil types as well (Fig. 2a). Chernozems had the highest average absorbance across the spectrum, although they were similar to Phaeozems and Eutric Cambisols, while Albic Luvisols had the lowest average absorbance (Fig. 2a). Within-type variance in absorbance of specific wavenumbers also differed among soil types (Fig. 2b). The different absorbance strength among samples is the research basis of using PLSDA models for prediction.

Using all 1816 NIR spectrum features, PLSDA achieved global internal validation (leave-one-out (LOO) cross validation) and external validation accuracies of 88.9% and 83.1%, respectively. LOO validation accuracy within soil types ranged from 63.2%-100.0% while external validation accuracy ranged from 62.5%-93.8% (Table 1). PLSDA was able to correctly classify all samples in the training set by using 26 latent variables, although external validation accuracy did not increase substantially above 13 variables (Fig. 3a).

Table 1 The recognition rates of soil types using PLSDA

Full size table

The GA-PLSDA selected 66 out of 1816 available features (3.6%). This model attained 98.0%, 92.2% and 93.5% training, LOO cross-validation and external validation accuracies, respectively (Fig. 3b). Within soil types, accuracies ranged from 94.7%-100% for the training set, 79.0%-97.9% for LOO cross-validation and 83.3%-100.0% for external validation (Table 2). Global internal and external validation accuracies were >90% with the GA-PLSDA with >20 latent variables (Fig. 3b).

Table 2 The recognition rates of soil types using GA-PLSDA

Full size table

In our study, the primary identifying NIR characteristic of various soil types were distributed in -OH region (7463-7037 cm⁻¹; 5263-5222 cm⁻¹; 4505-4503 cm⁻¹), organics (5292 cm⁻¹, 4320 cm⁻¹ and 4316 cm⁻¹), carbonate (4287 cm⁻¹) and illite (4094 cm⁻¹, 4050 cm⁻¹ and 4046 cm⁻¹). The -OH vibrations were predominant in each soil type, which is caused by vibrations of water bound in the interlayer lattices of clay minerals as hydrated cations and water adsorbed on particle surfaces^31,32,33. The carbonate absorptions occur near 4287 cm⁻¹ ^33,34, mainly occurring from Chernozems and Eutric Cambisols. NIR spectra may reflect the differences in biogeographical origin¹⁶, organic composition³³ and soil mineralogy. The relatively low rate of recognition accuracy for Haplic Luvisols (Table 1) was likely due to their geographic proximity with the Eutric Cambisls and Chemozems, or due to the fact that Haplic Luvisols shared pedogenesis (leached soils) with the Albic Luvisols^{31,32,33,34,35}. Albic Luvisols sampled from piedmont were mostly discriminated from other soil types by mineral composition rather than SOC content. The minerals in Albic Luvisols soil are mainly mica-derived illite³⁵. In general, relatively weak mineral weathering has occurred in northeast regions of China due to relatively dry and cold climatic conditions. As a result, minerals tend to be inherited from the parent material in those regions. Minerals of these five soils are typically clay mica (illite) (Albic Luvisols, Haplic Luvisols, Chernozems, Eutric Cambisols and Phaeozems), with small amounts of vermiculite (Haplic Luvisols and Eutric Cambisols soil), smectite (Eutric Cambisols, Chernozems), Kaolinite and unspecified minerals (Albic Luvisols soil)^35,36.

We used variable importance in project (VIP) scores to identify the features that were most important for characterizing each soil type. We defined VIP scores ≥1 as indicative of particularly important features for the model¹⁷. Among the 66 features selected by GA, there were 28, 30, 29, 32 and 28 with VIP scores ≥1 (solid points in Fig. 2a and Fig. 2b) for Albic Luvisols, Haplic Luvisols, Chernozems, Eutric Cambisols and Phaeozems, respectively. Features with high VIP scores indicate spectral regions that contributed substantially to the model³⁷. Some of these regions are associated with water, specific minerals, organic matter, or other identifiable aspects of the soil⁷. Our results showed that the high VIP features were distributed around 7065 cm⁻¹, 5222 cm⁻¹ and 4527 cm⁻¹. These important wavenumbers reflect the overtones and combinations in -OH, organics and minerals. Only 12 features had high VIP scores for all 5 soil types, suggesting that these soils may differ in the chemical structures that characterize them, rather than the relative amounts of a common set of structure.

The importance of variables selected by GA can also be evaluated by the magnitude of their coefficients in the GA-PLDSA model. As with VIP scores, soils had distinct patterns of coefficient magnitudes (Fig. 4). For example, Albic Luvisols had a concentration of features with large coefficients in the range of 4500 - 4000 cm⁻¹, while impactful features for Chernozems were evenly distributed across the spectrum. Variables whose coefficients have different signs in two soil types are particularly effective at discriminating between those soils.

When we re-sampled our training set, we verified that the PLSDA approach had consistently greater accuracy during calibration, but the GA-PLSDA approach produced higher accuracy during internal and external validation (Fig. 5). The subset of variables selected by the GA during our initial analysis performed well even with different training datasets (Fig. 5a). The median (with 95% CI) LOO validation and external validation classification accuracies for this GA-PLSDA were 85.6% (85.5%-85.8%) and 87.0% (86.8%-87.2%) respectively, a slight but consistent improvement over the corresponding accuracies of 84.3% (84.2%-84.5%) and 84.4% (84.2%-84.6%) obtained by PLSDA without variable selection. This indicates the features selected by the GA from the initial training data captured the key features from other possible training datasets. Recreating the GA variable selection process for each training dataset gave a virtually identical LOO validation accuracy of 85.6% (85.2%-86.0%), but external validation accuracy increased to 92.2% (91.6%-92.8%) (Fig. 5b). From this we conclude that the GA-PLSDA model building process is relatively robust to variance in the exact composition of the training and test datasets, although using variables appropriate to the training data results in substantial improvement. This provides additional evidence that GA variable selection identifies features of real and meaningful importance to the system being studied.

In summary, we have demonstrated the capacity of NIR spectral data coupled with PLSDA to recognize soils according to its category with a high degree of accuracy. The use of GA variable selection improved recognition accuracy through PLSDA and we demonstrated that variable selection was only minimally impacted by sampling effects. With the easily-obtained NIR spectrum of a soil, the already-established model is ready to be used as a reference tool to quickly distinguish between different soils types and place each soil type into its correct category. While our approach achieved very respectable recognition accuracy, there are several aspects of this method that could be improved. Although the method of maximum distance we used to classify individual samples is straightforward and easily implemented, other methods such as centroids or Mahalanobis distances between soil types might lead to improvements in classification accuracy. Alternative methods, such as nonlinear iterative partial least squares followed by linear discriminant analysis (NIPALS-LDA), may also perform better¹⁷. Data mining should always be explored to improve the accuracy and predictive models for soil type recognition, especially if more spectral data are available for the continued development across larger spatial scales. Finally, the GA method is based on stochastic processes and finding a globally optimal solution with a large number of variables and samples may require unreasonable computational resources. Although there may be only a local optimum, the set of variables selected by the GA greatly reduced the dimensionality of our input data, permitting us to focus our interpretation on a relatively narrow segment of the NIR spectrum.

Methods

Field description and soil properties

Soil samples (n = 230) were collected from farmland at 0-10 cm depth in northeastern China in fall 2011. Sampling was conducted over five counties including Dunhua, Changtu, Gongzhuling, Fuxin and Yushu, located at 41°N to 42°N latitude and 121°E to 128°E longitude, with the exact sampling locations shown in the map (Fig. 6). The climate is relatively cool and humid, with mean annual precipitation of 350–700 mm and mean annual air temperature of 3-9 °C for the five counties. Dominant soil types of this area are: Albic Luvisols, Haplic Luvisols, Chernozems, Eutric Cambisols and Phaeozems according to the FAO soil classification system³⁸. All studied soils were manually pre-identified and classified to provide as supervised information for model establishment. The soil samples were air dried, sieved through a 0.25 mm sieve and visible identifiable crop residues were manually removed for further analysis. We analyzed soil organic carbon using dry combustion method by an element analyzer (vario MACRO cube, Elementar Analysensysteme GMbH, Hanau, Germany) and soil texture using a pipette method³⁹.

NIR spectra measurements

NIR spectrum absorbance bands were from first overtone and combination bands of the fundamental vibrations in the mid-infrared region⁴⁰. Near infrared diffuse reflectance spectra were obtained from all soil samples (the average from three separate spectra for each sample) using a Thermo Nicolet 6700 spectrometer (Thermo Electron Scientific Instruments Corp., Madison, WI, USA). The background spectrum was eliminated automatically. The NIR spectra were recorded in the range of 7500 to 4000 cm^–1 with 64 scans and 4 cm^–1 resolution.

Model development

Model development was carried out in the R statistical environment (Version 3.1.1)⁴¹. PLSDA model development and cross-validation were conducted using the “pls” package⁴², while the GA procedure was implemented following the scheme proposed by Wehrens (2011)⁴³. In order to establish a valid and robust model, we randomly divided the soil sample data set into two parts: a training set for model development and cross-validation and a testing set for model external validation. For each soil type, samples were allocated between the training and testing sets along a 2:1 ratio, as described in the Supplementary. Since inherent features of all NIR spectra have no differences in magnitude, it was unnecessary to pretreat data prior to modeling.

PLSDA is a supervised classification method based on PLS regression¹⁸. In this study, the response matrix was conducted as a binary matrix having 5 columns with value 1 indicating group membership and 0 for non-membership. In order to predict values for multivariate response from a (potentially large) matrix of predictors; we used partial least squares regression (PLS2 algorithm), which calibrates all values in the response matrix simultaneously⁴⁴, to develop the PLSDA model. We used a simple maximum distance rule to identify the class membership of each sample, which in this study meant a sample was assigned to the soil type that has the largest predicted value from the PLS2 regression model. We assessed the quality of the PLSDA models incorporating from 1 to 30 latent variables by using internal validation (leave-one-out (LOO) cross-validation) and external validation. For both methods, we used percentage classification accuracy as the metric of model performance. Variable importance in projection (VIP) scores were used to estimate the importance of each variable in the projection used in a PLS model¹⁷.

Genetic algorithms (GA) have several basic steps: 1) coding of the variable; 2) initiation of population; 3) evolution of the response; 4) reproduction; 5) mutation and 6) repeat the process until a stopping criterion is reached. Generally, the stopping criterion includes a maximum number of generations, a maximum target outcome value for the fitness or a set number of generations⁴⁵. In this study, a simple GA procedure was used to select the minimum features important to PLSDA from the full set of NIR features. A very clear description of this process is given in section 2.4 of Ramadan et al.³⁰. Briefly, we constructed a “population” consisting of sets of NIR features to include. Each subset was used in a PLSDA with a set number of latent variables and assigned a fitness based on the following function:

where v is the set of features included, r_calibration is the classification accuracy obtained during PLSDA calibration and r_LOO is the classification accuracy during LOO cross-validation. The variable sets with the highest fitness would be recombined with other fit sets to construct the next generation of the population. The resulting “offspring” would undergo “mutation” via random addition or removal of individual features to prevent the population from becoming trapped at a local optimum. The process continued until either a predetermined fitness value was reached or a maximum number of generations had elapsed. Each subset contained at least one more feature than the number of latent variables to be included in the subsequent PLSDA, with a maximum of 75 features. We used a population size of 50, a maximum of 20 generations and a mutation probability of 0.05 per feature per generation.

Model comparison

We evaluated the robustness of variable selection by GA to sampling biases by replicating the model fitting process 1000 times using re-sampled training datasets. For our test case (Scheme 1), we used a single set of GA-selected features with all training datasets, while in baseline case (Scheme 2), we used the GA to select a new set of features for each training dataset. Each of these cases was compared to PLSDA models using all features. All models fit during this processes used 26 latent variables.

Additional Information

How to cite this article: Xie, H. et al. Soil type recognition as improved by genetic algorithm-based variable selection using near infrared spectroscopy and partial least squares discriminant analysis. Sci. Rep. 5, 10930; doi: 10.1038/srep10930 (2015).

References

Ludwig, B., Nitschke, R., Terhoeven-Urselmans, T., Michel, K. & Flessa, H. Use of mid-infrared spectroscopy in the diffuse-reflectance mode for the prediction of the composition of organic matter in soil and litter. J. Plant Nutr. and Soil Sci. 171, 384–391, 10.1002/jpln.200700022 (2008).
Article CAS Google Scholar
Viscarra Rossel, R. A., Walvoort, D. J. J., McBratney, A. B., Janik, L. J. & Skjemstad, J. O. Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma 131, 59–75, 10.1016/j.geoderma.2005.03.007 (2006).
Article ADS CAS Google Scholar
Awiti, A. O., Walsh, M. G., Shepherd, K. D. & Kinyamario, J. Soil condition classification using infrared spectroscopy: A proposition for assessment of soil condition along a tropical forest-cropland chronosequence. Geoderma 143, 73–84, 10.1016/j.geoderma.2007.08.021 (2008).
Article ADS CAS Google Scholar
Du, C., Linker, R. & Shaviv, A. Identification of agricultural Mediterranean soils using mid-infrared photoacoustic spectroscopy. Geoderma 143, 85–90, 10.1016/j.geoderma.2007.10.012 (2008).
Article ADS CAS Google Scholar
Reeves III, J. B., McCarty, G. W. & Meisinger, J. J. Near infrared reflectance spectroscopy for the analysis of agricultural soils. J. Near Infrared Spectrosc. 7, 179–193, 10.1255/jnirs.248 (1999).
Article ADS CAS Google Scholar
Chang, C. W., Laird, D. A., Mausbach, M. J. & Hurburgh, C. R. Near-infrared reflectance spectroscopy-principal components regression analyses of soil properties. Soil Sci. Soc. Am. J. 65, 480–490, 10.2136/sssaj2001.652480x (2001).
Article ADS CAS Google Scholar
Viscarra Rossel, R. A. & Behrens, T. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 158, 46–54, 10.1016/j.geoderma.2009.12.025 (2010).
Article ADS CAS Google Scholar
D’Acqui, L. P., Pucci, A. & Janik, L. J. Soil properties prediction of western Mediterranean islands with similar climatic environments by means of mid-infrared diffuse reflectance spectroscopy. Eur. J. Soil Sci. 61, 865–876, 10.1111/j.1365-2389.2010.01301.x (2010).
Article CAS Google Scholar
Soriano-Disla, J. M., Janik, L. J., Rossel, R. A. V., Macdonald, L. M. & McLaughlin, M. J. The performance of visible,near- and mid-infrared reflectance spectroscopy for prediction of soil physical, chemical and biological properties. Appl. Spectrosc. Rev. 49, 139–186, 10.1080/05704928.2013.811081 (2014).
Article ADS CAS Google Scholar
Berrueta, L. A., Alonso-Salces, R. M. & Heberger, K. Supervised pattern recognition in food analysis. J. Chromatogr. A 1158, 196–214, 10.1016/j.chroma.2007.05.024 (2007).
Article CAS PubMed Google Scholar
Lavine, B. K. Pattern recognition. Crit. Rev. Anal. Chem. 36, 153–161, 10.1080/10408340600969411 (2006).
Article CAS Google Scholar
Dragovic, S. & Onjia, A. Classification of soil samples according to their geographic origin using gamma-ray spectrometry and principal component analysis. J. Environ. Radioact. 89, 150–158, 10.1016/j.jenvrad.2006.05.002 (2006).
Article CAS PubMed Google Scholar
Linker, R. Soil classification via mid-infrared spectroscopy. IFIP International Federation for Information Processing. 259, 1137–1146 (2008).
Article Google Scholar
Ramadan, Z., Song, X. H., Hopke, P. K., Johnson, M. J. & Scow, K. M. Variable selection in classification of environmental soil samples for partial least square and neural network models. Anal. Chim. Acta 446, 231–242, 10.1016/s0003-2670(01)00999-0 (2001).
Article Google Scholar
Moros, J., Martínez-Sánchez, M. J., Pérez-Sirvent, C., Garrigues, S. & de la Guardia, M. Testing of the region of Murcia soils by near infrared diffuse reflectance spectroscopy and chemometrics. Talanta 78, 388–398, 10.1016/j.talanta.2008.11.041 (2009).
Article CAS PubMed Google Scholar
Dragovic, S. & Onjia, A. Classification of soil samples according to geographic origin using gamma-ray spectrometry and pattern recognition methods. Appl. Radiat. Isotopes 65, 218–224, 10.1016/j.apradiso.2006.07.005 (2007).
Article CAS Google Scholar
Baron, M., Gonzalez-Rodriguez, J., Croxton, R., Gonzalez, R. & Jimenez-Perez, R. Chemometric study on the forensic discrimination of soil types using their infrared spectral characteristics. Appl. Spectrosc. 65, 1151–1161, 10.1366/10-06197 (2011).
Article ADS CAS PubMed Google Scholar
Barker, M., Rayens, W. Partial least squares for discriminantion. J. Chemometrics 17, 166–173, 10.1002/cem.785 (2003).
Article CAS Google Scholar
Westerhuis, J. A. et al. Assessment of PLSDA cross validation. Metabolomics 4, 81–89, 10.1007/s11306-007-0099-6 (2008).
Article CAS Google Scholar
So, C. F., Choi, K. S., Chung, J. W. & Wong, T. K. An extension to the discriminant analysis of near-infrared spectra. Med. Eng. Phys. 35, 172–177, 10.1016/j.medengphy.2012.04.012 (2013).
Article CAS PubMed Google Scholar
Zou, X. B., Zhao, J. W., Povey, M. J. W., Holmes, M. & Mao, H. P. Variables selection methods in near-infrared spectroscopy. Anal. Chim. Acta 667, 14–32, 10.1016/j.aca.2010.03.048 (2010).
Article CAS Google Scholar
Guyon, I. & Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–11812, 10.1162/153244303322753616 (2003).
Article MATH Google Scholar
Andersen, C. M. & Bro, R. Variable selection in regression-a tutorial. J. Chemometrics 24, 728–737, 10.1002/cem.1360 (2010).
Article CAS Google Scholar
Balabin, R. M. & Smirnov, S. V. Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. Anal. Chim. Acta 692, 63–72, 10.1016/j.aca.2011.03.006 (2011).
Article CAS PubMed Google Scholar
Silva, A. C., Lira Pontes, L. F. B., Pimentel, M. F. & Pontes, M. J. C. Detection of adulteration in hydrated ethyl alcohol fuel using infrared spectroscopy and supervised pattern recognition methods. Talanta 93, 129–134, 10.1016/j.talanta.2012.01.060 (2012).
Article CAS PubMed Google Scholar
Devos, O., Downey, G. & Duponchel, L. Simultaneous data pre-processing and SVM classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils. Food Chem. 148, 124–130, 10.1016/j.foodchem.2013.10.020 (2014).
Article CAS Google Scholar
Xing, J., Guyer, D., Ariana, D. & Lu, R. Determining optimal wavebands using genetic algorithm for detection of internal insect infestation in tart cherry. Sens. Instrum. Food Qual. Saf. 2, 161–167, 10.1007/s11694-008-9047-z (2008).
Article Google Scholar
Niazi, A. & Leardi, R. Genetic algorithms in chemometrics. J. Chemometrics 26, 345–351, 10.1002/cem.2426 (2012).
Article CAS Google Scholar
Arakawa, M., Yamashita, Y. & Funatsu, K. Genetic algorithm-based wavelength selection method for spectral calibration. J. Chemometrics 25, 10–19, 10.1002/cem.1339 (2011).
Article CAS Google Scholar
Ramadan, Z., Jacobs, D., Grigorov, M. & Kochhar, S. Metabolic profiling using principal component analysis, discriminant partial least squares and genetic algorithms. Talanta 68, 1683–1691, 10.1016/j.talanta.2005.08.042 (2006).
Article CAS PubMed Google Scholar
Bishop, J. L., Lane, M. D., Dyar, M. D. & Brown, A. J. Reflectance and emission spectroscopy study of four groups of phyllosilicates: smectites, kaolinite-serpentines, chlorites and micas. Clay Miner. 43, 35–54, 10.1180/claymin.2008.043.1.03 (2008).
Article ADS CAS Google Scholar
Xie, H. T., Yang, X. M., Drury, C. F., Yang, J. Y. & Zhang, X. D. Predicting soil organic carbon and total nitrogen using mid- and near-infrared spectra for Brookston clay loam soil in Southwestern Ontario, Canada. Can. J. Soil Sci. 91, 53–63, 10.4141/cjss10029 (2011).
Article CAS Google Scholar
Viscarra Rossel, R. A. & Webster, R. Discrimination of Australian soil horizons and classes from their visible-near infrared spectra. Eur. J. Soil Sci. 62, 637–647, 10.1111/j.1365-2389.2011.01356.x (2011).
Article Google Scholar
Clark, R. N., King, T. V. V., Klejwa, M., Swayze, G. A. & Vergo, N. High spectral resolution reflectance spectroscopy of minerals. J. Geophys. Res. 95, 126503, 10.1029/JB095iB08p12653 (1990).
Article Google Scholar
Xie, P. R. Chemical and mineral properties of soils in Northeast China (Science Press, Beijing, 2010) (In Chinese).
Buol, S. W., Southard, R. J., Graham, R. C. & McDaniel, P. A. Soil genesis and classification-5th Edition (Wiley-Blackwell, New York, 2003).
Kuligowski, J., Quintas, G., Herwig, C. & Lendl, B. A rapid method for the differentiation of yeast cells grown under carbon and nitrogen-limited conditions by means of partial least squares discriminant analysis employing infrared micro-spectroscopic data of entire yeast cells. Talanta 99, 566–573, 10.1016/j.talanta.2012.06.036 (2012).
Article CAS PubMed PubMed Central Google Scholar
IUSS Working Group WBR. World reference base for soil resources, in World Soil Resources Reports No. 106 (FAO, Rome, 2014).
Gee, G. W., Or, D. Particle-size analysis. in Mehtods of Soil Analysis, Part 4: Physical Mehtods (Ed. Dane, J. H. & Topp, G. C. ) 383–411 (American Society of Agronomy and Soil Science Society of America, Madison, WI, USA, 2002).
Stenberg, B., Viscarra Rossel, R. A., Mouazen, A. M. & Wetterlind, J. in Advances in Agronomy Vol. 107 (ed L. Sparks Donald ) 163–215 (Academic Press, Waltham, MA, USA 2010).
Article CAS Google Scholar
R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statitical Computing, Vienna, Austria URL http://www.R-project.org (2014)(Date of access: 20/07/2014).
Mevik, B.-H. & Wehrens, R. The pls package: Principal component and partial least squares regression in R. J. Stat. Softw. 18, 1–23, 10.1.1.357.2177 (2007).
Article Google Scholar
Wehrens, R. Chemometrics with R: multivariate data analysis in the natural sciences and life sciences (Springer, Heidelberg, 2011).
Galtier, O. et al. Comparison of PLS1-DA, PLS2-DA and SIMCA for classification by origin of crude petroleum oils by MIR and virgin olive oils by NIR for different spectral regions. Vib. Spectrosc. 55, 132–140, 10.1016/j.vibspec.2010.09.012 (2011).
Article CAS Google Scholar
Jarvis, R. M. & Goodacre, R. Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data. Bioinformatics 21, 860–868, 10.1093/bioinformatics/bti102 (2005).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank the National Natural Science Foundation of China (No. 41171199) and the “Strategic Priority Research Program-Climate Change: Carbon Budget and Relevant Issues” of the Chinese Academy of Sciences (No. XDA05050501) for their support.

Author information

Xie Hongtu and Zhao Jinsong contributed equally to this work.

Authors and Affiliations

State Key Laboratory of Forest and Soil Ecology, Institute of Applied Ecology, Chinese Academy of Sciences, Shenyang, 110164, China
Hongtu Xie, Xudong Zhang & Chao Liang
Key Laboratory of Pollution Ecology and Environmental Engineering, Institute of Applied Ecology, Chinese Academy of Sciences, Shenyang, 110164, China
Hongtu Xie
College of Resources and Environment, Huazhong Agricultural University, Wuhan, 430070, China
Jinsong Zhao
College of Land & Environment, Shenyang Agricultural University, Shenyang, 110866, China
Qiubing Wang & Jingkuan Wang
Key Laboratory of Mollisols Agroecology, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Haerbin, 150081, China
Yueyu Sui
Greenhouse and Processing Crops Research Centre, Agriculture & Agri-Food Canada, Harrow, N0R 1G0, Ontario, Canada
Xueming Yang
National Field Research Station of Shenyang Agroecosystems, Shenyang, 110016, China
Xudong Zhang
Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
Chao Liang

Authors

Hongtu Xie
View author publications
You can also search for this author in PubMed Google Scholar
Jinsong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Qiubing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yueyu Sui
View author publications
You can also search for this author in PubMed Google Scholar
Jingkuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xueming Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xudong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Liang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Project planning and design: H.X., J.Z. and C.L.; Sampling collection: Q.W., Y.S. and J.W.; Near infrared spectroscopy analysis: H.X.; Model construction: J.Z.; Paper construction: X.Y., X.Z. and C.L.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Supplementary Dataset 1

Supplementary Dataset 2

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Xie, H., Zhao, J., Wang, Q. et al. Soil type recognition as improved by genetic algorithm-based variable selection using near infrared spectroscopy and partial least squares discriminant analysis. Sci Rep 5, 10930 (2015). https://doi.org/10.1038/srep10930

Download citation

Received: 22 August 2014
Accepted: 23 April 2015
Published: 18 June 2015
DOI: https://doi.org/10.1038/srep10930

This article is cited by

Soil type identification model using a hybrid computer vision and machine learning approach
- Bekalu Tadele Abeje
- Ayodeji Olalekan Salau
- Abrham Debasu Mengistu
Multimedia Tools and Applications (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.