Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Tutorial: multivariate classification for vibrational spectroscopy in biological samples

Abstract

Vibrational spectroscopy techniques, such as Fourier-transform infrared (FTIR) and Raman spectroscopy, have been successful methods for studying the interaction of light with biological materials and facilitating novel cell biology analysis. Spectrochemical analysis is very attractive in disease screening and diagnosis, microbiological studies and forensic and environmental investigations because of its low cost, minimal sample preparation, non-destructive nature and substantially accurate results. However, there is now an urgent need for multivariate classification protocols allowing one to analyze biologically derived spectrochemical data to obtain accurate and reliable results. Multivariate classification comprises discriminant analysis and class-modeling techniques where multiple spectral variables are analyzed in conjunction to distinguish and assign unknown samples to pre-defined groups. The requirement for such protocols is demonstrated by the fact that applications of deep-learning algorithms of complex datasets are being increasingly recognized as critical for extracting important information and visualizing it in a readily interpretable form. Hereby, we have provided a tutorial for multivariate classification analysis of vibrational spectroscopy data (FTIR, Raman and near-IR) highlighting a series of critical steps, such as preprocessing, data selection, feature extraction, classification and model validation. This is an essential aspect toward the construction of a practical spectrochemical analysis model for biological analysis in real-world applications, where fast, accurate and reliable classification models are fundamental.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1
Fig. 2
Fig. 3
Fig. 4: Outlier detection test by a Hotelling’s T2 versus Q residuals chart.
Fig. 5: Results for PLS-DA models in datasets 1–3.

Data availability

The datasets generated during and/or analyzed during the current study are available in the IRootLab toolbox (http://trevisanj.github.io/irootlab/), in the Figshare repository (https://doi.org/10.6084/m9.figshare.6744206.v1), and in the Eigenvector Research repository (http://www.eigenvector.com/data/Corn/index.html).

Code availability

The MATLAB code and instructions on how to process the data are presented in the Supplementary Method.

References

  1. Martin, F. L. et al. Distinguishing cell types or populations based on the computational analysis of their infrared spectra. Nat. Protoc. 5, 1748–1760 (2010).

    CAS  PubMed  Google Scholar 

  2. Santos, M. C. D., Morais, C. L. M., Nascimento, Y. M., Araujo, J. M. G. & Lima, K. M. G. Spectroscopy with computational analysis in virological studies: a decade (2006–2016). Trends Anal. Chem. 97, 244–256 (2017).

    CAS  Google Scholar 

  3. Baker, M. J. et al. Using Fourier transform IR spectroscopy to analyze biological materials. Nat. Protoc. 9, 1771–1791 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Movasaghi, Z., Rehman, S. & ur Rehman, I. Fourier transform infrared (FTIR) spectroscopy of biological tissues. Appl. Spectrosc. Rev. 43, 134–179 (2008).

    CAS  Google Scholar 

  5. Kelly, J. G. et al. Biospectroscopy to metabolically profile biomolecular structure: a multistage approach linking computational analysis with biomarkers. J. Proteome Res. 10, 1437–1448 (2011).

    CAS  PubMed  Google Scholar 

  6. Paraskevaidi, M. et al. Differential diagnosis of Alzheimer’s disease using spectrochemical analysis of blood. Proc. Natl Acad. Sci. USA 114, E7929–E7938 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Pasquini, C. Near infrared spectroscopy: a mature analytical technique with new perspectives—a review. Anal. Chim. Acta 1016, 8–36 (2018).

    Google Scholar 

  8. Butler, H. J. et al. Using Raman spectroscopy to characterize biological materials. Nat. Protoc. 11, 664–687 (2016).

    CAS  PubMed  Google Scholar 

  9. Qu, J. H. et al. Applications of near-infrared spectroscopy in food safety evaluation and control: a review of recent research advances. Crit. Rev. Food Sci. Nutr. 55, 1939–1954 (2015).

    CAS  PubMed  Google Scholar 

  10. Scotter, C. Use of near infrared spectroscopy in the food industry with particular reference to its applications to on/in-line food processes. Food Control 1, 142–149 (1990).

    Google Scholar 

  11. Prieto, N., Pawluczyk, O., Dugan, M. E. R. & Aalhus, J. L. A review of the principles and applications of near-infrared spectroscopy to characterize meat, fat, and meat products. Appl. Spectrosc. 71, 1403–1426 (2017).

    CAS  PubMed  Google Scholar 

  12. Karoui, R., Downey, G. & Blecker, C. Mid-infrared spectroscopy coupled with chemometrics: a tool for the analysis of intact food systems and the exploration of their molecular structure–quality relationships—a review. Chem. Rev. 110, 6144–6168 (2010).

    CAS  PubMed  Google Scholar 

  13. Li-Chan, E. C. Y. The applications of Raman spectroscopy in food science. Trends Food Sci. Tech 7, 361–370 (1996).

    CAS  Google Scholar 

  14. Jin, H. et al. The use of Raman spectroscopy in food processes: a review. Appl. Spectrosc. Rev. 51, 12–22 (2015).

    Google Scholar 

  15. Bittner, L. K., Schonbichler, S. A., Bonn, G. K. & Huck, C. W. Near infrared spectroscopy (NIRS) as a tool to analyze phenolic compounds in plants. Curr. Anal. Chem. 9, 417–423 (2013).

    CAS  Google Scholar 

  16. Cozzolino, D. Use of infrared spectroscopy for in-field measurement and phenotyping of plant properties: instrumentation, data analysis, and examples. Appl. Spectrosc. Rev. 49, 564–584 (2014).

    CAS  Google Scholar 

  17. Buitrago, M. F., Skidmore, A. K., Groen, T. A. & Hecker, C. A. Connecting infrared spectra with plant traits to identify species. ISPRS J. Photogramm. Remote Sens. 139, 183–200 (2018).

    Google Scholar 

  18. Baranska, M., Roman, M., Dobrowlski, J. C., Schulz, H. & Baranski, R. Recent advances in Raman analysis of plants: alkaloids, carotenoids, and polyacetylenes. Curr. Anal. Chem. 9, 108–127 (2013).

    CAS  Google Scholar 

  19. Quintelas, C., Mesquita, D. P., Lopes, J. A., Ferreira, E. C. & Sousa, C. Near-infrared spectroscopy for the detection and quantification of bacterial contaminations in pharmaceutical products. Int. J. Pharm. 492, 199–206 (2015).

    CAS  PubMed  Google Scholar 

  20. Naumann, D., Helm, D. & Labischinski, H. Microbiological characterizations by FT-IR spectroscopy. Nature 351, 81–82 (1991).

    CAS  PubMed  Google Scholar 

  21. Schmitt, J. & Flemming, H. C. FTIR-spectroscopy in microbial and material analysis. Int. Biodeterior. Biodegratation 41, 1–11 (1998).

    CAS  Google Scholar 

  22. Rodriguez-Saona, L. E., Khambaty, F. M., Fry, F. S. & Calvey, E. M. Rapid detection and identification of bacterial strains by Fourier transform near-infrared spectroscopy. J. Agric. Food Chem. 49, 574–579 (2001).

    CAS  PubMed  Google Scholar 

  23. Zarnowiec, P., Lechowicz, Ł., Czerwonka, G. & Kaca, W. Fourier transform infrared spectroscopy (FTIR) as a tool for the identification and differentiation of pathogenic bacteria. Curr. Med. Chem. 22, 1710–1718 (2015).

    CAS  PubMed  Google Scholar 

  24. Jarvis, R. M. & Goodacre, R. Discrimination of bacteria using surface-enhanced Raman spectroscopy. Anal. Chem. 76, 40–47 (2004).

    CAS  PubMed  Google Scholar 

  25. Stöckel, S., Kirchhoff, J., Neugebauer, U., Rösch, P. & Popp, J. The application of Raman spectroscopy for the detection and identification of microorganisms. J. Raman Spectrosc. 47, 89–109 (2016).

    Google Scholar 

  26. Strola, S. A. et al. Single bacteria identification by Raman spectroscopy. J. Biomed. Opt. 19, 111610 (2014).

    PubMed  Google Scholar 

  27. Weiss, R. et al. Surface-enhanced Raman spectroscopy of microorganisms: limitations and applicability on the single-cell level. Analyst 144, 943–953 (2019).

    CAS  PubMed  Google Scholar 

  28. Lorenz, B., Wichmann, C., Stöckel, S., Rösch, P. & Popp, J. Cultivation-free Raman spectroscopic investigations of bacteria. Trends Microbiol. 25, 413–424 (2017).

    CAS  PubMed  Google Scholar 

  29. Sakudo, A. Near-infrared spectroscopy for medical applications: current status and future perspectives. Clin. Chim. Acta 455, 181–188 (2016).

    CAS  PubMed  Google Scholar 

  30. De Bruyne, S., Speeckaert, M. M. & Delanghe, J. R. Applications of mid-infrared spectroscopy in the clinical laboratory setting. Crit. Rev. Clin. Lab. Sci. 55, 1–20 (2018).

    PubMed  Google Scholar 

  31. Bunaciu, A. A., Aboul-Enein, H. Y. & Fleschin, Ş. Vibrational spectroscopy in clinical Analysis. Appl. Spectrosc. Rev. 50, 176–191 (2014).

    Google Scholar 

  32. Pence, I. & Mahadevan-Jansen, A. Clinical instrumentation and applications of Raman spectroscopy. Chem. Soc. Rev. 45, 1958–1979 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Baker, M. J. et al. Clinical applications of infrared and Raman spectroscopy: state of play and future challenges. Analyst 143, 1735–1757 (2018).

    CAS  PubMed  Google Scholar 

  34. Hibbert, D. B. Vocabulary of concepts and terms in chemometrics (IUPAC Recommendations 2016). Pure Appl. Chem. 88, 407–443 (2016).

    CAS  Google Scholar 

  35. Mandel, J. Statistical methods in analytical chemistry. J. Chem. Educ. 26, 534 (1949).

    CAS  Google Scholar 

  36. Wallace, R. M. Analysis of absorption spectra of multicomponent systems. J. Phys. Chem. 64, 899–901 (1960).

    CAS  Google Scholar 

  37. Weber, G. Enumeration of components in complex systems by fluorescence spectrophotometry. Nature 190, 27–29 (1961).

    CAS  PubMed  Google Scholar 

  38. Brereton, R. G. et al. Chemometrics in analytical chemistry—part I: history, experimental design and data analysis tools. Anal. Bioanal. Chem. 409, 5891–5899 (2017).

    CAS  PubMed  Google Scholar 

  39. Beebe, K. R., Pell, R. J. & Seasholtz, M. B. Chemometrics: A Practical Guide Vol. 4 (Wiley, 1998).

  40. Brereton, R. G. & Lloyd, G. R. Partial least squares discriminant analysis: taking the magic away. J. Chemom. 28, 213–225 (2014).

    CAS  Google Scholar 

  41. Jacyna, J., Kordalewska, M. & Markuszewski, M. J. Design of experiments in metabolomics-related studies: an overview. J. Pharm. Biomed. Anal. 164, 598–606 (2019).

    CAS  PubMed  Google Scholar 

  42. Morais, C. L. M. et al. Standardization of complex biologically derived spectrochemical datasets. Nat. Protoc. 14, 1546–1577 (2019).

    CAS  PubMed  Google Scholar 

  43. Jones, S., Carley, S. & Harrison, M. An introduction to power and sample size estimation. Emerg. Med. J. 20, 453–458 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Seasholtz, M. B. & Kowalski, B. The parsimony principle applied to multivariate calibration. Anal. Chim. Acta 277, 165–177 (1993).

    CAS  Google Scholar 

  45. Paraskevaidi, M. et al. Blood-based near-infrared spectroscopy for the rapid low-cost detection of Alzheimer’s disease. Analyst 143, 5959–5964 (2018).

    CAS  PubMed  Google Scholar 

  46. Pavia, D. L., Lampman, G. M., Kriz, G. S. & Vyvyan, J. A. Introduction to Spectroscopy (Cengage Learning, 2008).

  47. Hu, Q., Lü, X., Lu, W., Chen, Y. & Liu, H. An extensive study on Raman spectra of water from 253 to 753 K at 30 MPa: a new insight into structure of water. J. Mol. Spectrosc. 292, 23–27 (2013).

    CAS  Google Scholar 

  48. Callery, E. L. et al. New approach to investigate common variable immunodeficiency patients using spectrochemical analysis of blood. Sci. Rep. 9, 7239 (2019).

    PubMed  PubMed Central  Google Scholar 

  49. Tfayli, A. et al. Digital dewaxing of Raman signals: discrimination between nevi and melanoma spectra obtained from paraffin-embedded skin biopsies. Appl. Spectrosc. 63, 564–570 (2009).

    CAS  PubMed  Google Scholar 

  50. de Lima, F. A. et al. Digital de-waxing on FTIR images. Analyst 142, 1358–1370 (2017).

    PubMed  Google Scholar 

  51. Ibrahim, O. et al. Improved protocols for pre-processing Raman spectra of formalin fixed paraffin preserved tissue sections. Anal. Methods 9, 4709–4717 (2017).

    CAS  Google Scholar 

  52. Meksiarun, P. et al. Comparison of multivariate analysis methods for extracting the paraffin component from the paraffin-embedded cancer tissue spectra for Raman imaging. Sci. Rep. 7, 44890 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Bassan, P. et al. Transmission FT-IR chemical imaging on glass substrates: applications in infrared spectral histopathology. Anal. Chem. 86, 1648–1653 (2014).

    CAS  PubMed  Google Scholar 

  54. Savitzky, A. & Golay, M. J. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36, 1627–1639 (1964).

    CAS  Google Scholar 

  55. Brown, C. D. & Wentzell, P. D. Hazards of digital smoothing filters as a preprocessing tool in multivariate calibration. J. Chemom. 13, 133–152 (1999).

    CAS  Google Scholar 

  56. Geladi, P., MacDougall, D. & Martens, H. Linearization and scatter-correction for near-infrared reflectance spectra of meat. Appl. Spectrosc. 39, 491–500 (1985).

    Google Scholar 

  57. Barnes, R., Dhanoa, M. S. & Lister, S. J. Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Appl. Spectrosc. 43, 772–777 (1989).

    CAS  Google Scholar 

  58. Bassan, P. et al. Resonant Mie scattering in infrared spectroscopy of biological materials—understanding the ‘dispersion artefact’. Analyst 134, 1586–1593 (2009).

    CAS  PubMed  Google Scholar 

  59. Bassan, P. et al. Resonant Mie Scattering (RMieS) correction of infrared spectra from highly scattering biological samples. Analyst 135, 268–277 (2010).

    CAS  PubMed  Google Scholar 

  60. Kiefer, W. et al. Raman-Mie scattering from single laser trapped microdroplets. J. Mol. Struct. 408–409, 113–120 (1997).

    Google Scholar 

  61. Liland, K. H., Kohler, A. & Afseth, N. K. Model‐based pre‐processing in Raman spectroscopy of biological samples. J. Raman Spectrosc. 47, 643–650 (2016).

    CAS  Google Scholar 

  62. Hastie, T., Tibshinari, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd edn (Springer, 2009).

  63. Martens, H. & Martens, M. Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR). Food Qual. Prefer. 11, 5–16 (2000).

    Google Scholar 

  64. Rousseeuw, P. J. & Hubert, M. Robust statistics for outlier detection. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1, 73–79 (2011).

    Google Scholar 

  65. Jiang, F., Liu, G., Du, J. & Sui, Y. Initialization of K-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016).

    Google Scholar 

  66. Bakeev, K. A. Process Analytical Technology: Spectroscopic Tools and Implementation Strategies for the Chemical and Pharmaceutical Industries 2nd edn (John Wiley & Sons, 2010).

  67. Morais, C. L. M., Santos, M. C. D., Lima, K. M. G. & Martin, F. L. Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach. Bioinformatics 35, 5257–5263 (2019).

    PubMed  PubMed Central  Google Scholar 

  68. Kennard, R. W. & Stone, L. A. Computer aided design of experiments. Technometrics 11, 137–148 (1969).

    Google Scholar 

  69. Bro, R. & Smilde, A. K. Principal component analysis. Anal. Methods 6, 2812–2831 (2014).

    CAS  Google Scholar 

  70. Martin, F. L. et al. Identifying variables responsible for clustering in discriminant analysis of data from infrared microspectroscopy of a biological sample. J. Comput. Biol. 14, 1176–1184 (2007).

    CAS  PubMed  Google Scholar 

  71. Wold, S. & Sjöström, M. SIMCA: a method for analyzing chemical data in terms of similarity and analogy. In Chemometrics: Theory and Application (ed. Kowalski, B. R.) 243–282 (American Chemical Society, 1977).

  72. Marini, F. Classification methods in chemometrics. Curr. Anal. Chem. 6, 72–79 (2010).

    CAS  Google Scholar 

  73. Pomerantsev, A. L. Acceptance areas for multivariate classification derived by projection methods. J. Chemom. 22, 601–609 (2008).

    CAS  Google Scholar 

  74. Dixon, S. J. & Brereton, R. G. Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure. Chemometr. Intell. Lab. Syst. 95, 1–17 (2009).

    CAS  Google Scholar 

  75. Wu, W. et al. Comparison of regularized discriminant analysis linear discriminant analysis and quadratic discriminant analysis applied to NIR data. Anal. Chim. Acta 329, 257–265 (1996).

    CAS  Google Scholar 

  76. Morais, C. L. M. & Lima, K. M. G. Principal component analysis with linear and quadratic discriminant analysis for identification of cancer samples based on mass spectrometry. J. Braz. Chem. Soc. 29, 472–481 (2018).

    CAS  Google Scholar 

  77. Morais, C. L. M., Lima, K. M. G. & Martin, F. L. TTWD-DA: a MATLAB toolbox for discriminant analysis based on trilinear three-way data. Chemometr. Intell. Lab. Syst. 188, 46–53 (2019).

    CAS  Google Scholar 

  78. Geladi, P. & Kowalski, B. R. Partial least-squares regression: a tutorial. Anal. Chim. Acta 185, 1–17 (1986).

    CAS  Google Scholar 

  79. de Jong, S. SIMPLS: an alternative approach to partial least squares regression. Chemometr. Intell. Lab. Syst. 18, 251–263 (1993).

    Google Scholar 

  80. Wold, S., Sjöström, M. & Eriksson, L. PLS-regression: a basic tool of chemometrics. Chemometr. Intell. Lab. Syst. 58, 109–130 (2001).

    CAS  Google Scholar 

  81. Pomerantsev, A. L. & Rodionova, O. Ye. Multiclass partial least squares discriminant analysis: taking the right way—a critical tutorial. J. Chemom. 32, e3030 (2018).

    Google Scholar 

  82. Pérez, N. F., Ferré, J. & Boqué, R. Calculation of the reliability of classification in discriminant partial least-squares binary classification. Chemometr. Intell. Lab. Syst. 95, 122–128 (2009).

    Google Scholar 

  83. Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967).

    Google Scholar 

  84. Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).

    Google Scholar 

  85. Brereton, R. G. & Lloyd, G. R. Support Vector Machines for classification and regression. Analyst 135, 230–267 (2010).

    CAS  PubMed  Google Scholar 

  86. Marini, F., Bucci, R., Magrì, A. L. & Magrì, A. D. Artificial neural networks in chemometrics: history, examples and perspectives. Microchem. J. 88, 178–185 (2008).

    CAS  Google Scholar 

  87. Fawagreh, K., Gaber, M. M. & Elyan, R. Random forests: from early developments to recent advancements. Syst. Sci. Control Eng. 2, 602–609 (2014).

    Google Scholar 

  88. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    CAS  PubMed  Google Scholar 

  89. Yang, Q., Zhang, L., Wang, L. & Xiao, H. MultiDA: chemometric software for multivariate data analysis based on Matlab. Chemometr. Intell. Lab. Syst. 116, 1–8 (2012).

    CAS  Google Scholar 

  90. De Gussem, K., De Gelder, J., Vandenabeele, P. & Moens, L. The Biodata toolbox for MATLAB. Chemometr. Intell. Lab. Syst. 95, 49–52 (2009).

    Google Scholar 

  91. Cordella, C. B. Y. & Bertrand, D. SAISIR: a new general chemometric toolbox. Trends Anal. Chem. 54, 75–82 (2014).

    CAS  Google Scholar 

  92. Rossel, R. A. V. ParLeS: software for chemometric analysis of spectroscopic data. Chemometr. Intell. Lab. Syst. 90, 72–83 (2008).

    Google Scholar 

  93. Reisner, L. A., Cao, A. & Pandya, A. K. An integrated software system for processing, analyzing, and classifying Raman spectra. Chemometr. Intell. Lab. Syst. 105, 83–90 (2011).

    CAS  Google Scholar 

  94. Jing, R., Sun, J., Wang, Y., Li, M. & Pu, X. PML: a parallel machine learning toolbox for data classification and regression. Chemometr. Intell. Lab. Syst. 138, 1–6 (2014).

    CAS  Google Scholar 

  95. Zontov, Y. V., Rodionova, O., Ye., Kucheryavskiy, S. V. & Pomerantsev, A. L. DD-SIMCA—a MATLAB GUI tool for data driven SIMCA approach. Chemometr. Intell. Lab. Syst. 167, 23–28 (2017).

    CAS  Google Scholar 

  96. Li, H. D., Xu, Q. S. & Liang, Y. Z. libPLS: an integrated library for partial least squares regression and linear discriminant analysis. Chemometr. Intell. Lab. Syst. 176, 34–43 (2018).

    CAS  Google Scholar 

  97. Chang, C. C. & Lin, C. J. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol 2, 27:1–27:27 (2011).

    Google Scholar 

  98. Alsberg, B. K. & Hagen, O. J. How octave can replace Matlab in chemometrics. Chemometr. Intell. Lab. Syst. 84, 195–200 (2006).

    CAS  Google Scholar 

  99. Wehrens, R. Chemometrics with R: Multivariate Data Analysis in the Natural Sciences and Life Sciences. (Springer, New York, NY, USA, 2011)..

  100. Varmuza, K. & Filzmoser, P. Introduction to Multivariate Statistical Analysis in Chemometrics (CRC Press, 2009).

  101. Jarvis, R. M., Broadhurst, D., Johnson, H., O’Boyle, N. M. & Goodacre, R. PYCHEM: a multivariate analysis package for python. Bioinformatics 22, 2565–2566 (2006).

    CAS  PubMed  Google Scholar 

  102. Ferrés, M., Platikanov, S., Tsakovski, S. & Tauler, R. Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation. J. Chemom. 29, 528–536 (2015).

    Google Scholar 

  103. Nørgaard, L. et al. Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Appl. Spectrosc. 54, 413–419 (2000).

    Google Scholar 

  104. Brown, C. D. & Green, R. L. Critical factors limiting the interpretation of regression vectors in multivariate calibration. Trends Anal. Chem. 28, 506–514 (2009).

    CAS  Google Scholar 

  105. de Juan, A. & Tauler, R. Multivariate curve resolution (MCR) from 2000: progress in concepts and applications. Crit. Rev. Anal. Chem. 36, 163–176 (2006).

    Google Scholar 

  106. Jaumot, J., de Juan, A. & Tauler, R. MCR-ALS GUI 2.0: new features and applications. Chemometr. Intell. Lab. Syst. 140, 1–12 (2015).

    CAS  Google Scholar 

  107. de Juan, A. et al. Spectroscopic imaging and chemometrics: a powerful combination for global and local sample analysis. Trends Anal. Chem. 23, 70–79 (2004).

    Google Scholar 

  108. Radovic, M., Ghalwash, M., Filipovic, N. & Obradovic, Z. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics 18, 9 (2017).

    PubMed  PubMed Central  Google Scholar 

  109. Soares, S. F. C., Gomes, A. A., Araujo, M. C. U., Galvão Filho, A. R. & Galvão, R. K. H. The successive projections algorithm. Trends Anal. Chem. 42, 84–98 (2013).

    CAS  Google Scholar 

  110. Theophilou, G. et al. Synchrotron- and focal plane array-based Fourier-transform infrared spectroscopy differentiates the basalis and functionalis epithelial endometrial regions and identifies putative stem cell regions of human endometrial glands. Anal. Bioanal. Chem. 410, 4541–4554 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  111. McCall, J. Genetic algorithms for modelling and optimisation. J. Comput. Appl. Math. 184, 205–222 (2005).

    Google Scholar 

  112. Siqueira, L. F. S. & Lima, K. M. G. MIR-biospectroscopy coupled with chemometrics in cancer studies. Analyst 141, 4833–4847 (2016).

    CAS  PubMed  Google Scholar 

  113. Siqueira, L. F. S. & Lima, K. M. G. A decade (2004–2014) of FTIR prostate cancer spectroscopy studies: an overview of recent advancements. Trends Anal. Chem. 82, 208–221 (2016).

    CAS  Google Scholar 

  114. Siqueira, L. F. S., Morais, C. L. M., Araújo Júnior, R. F., de Araújo, A. A. & Lima, K. M. G. SVM for FT‐MIR prostate cancer classification: an alternative to the traditional methods. J. Chemom. 32, e3075 (2018).

    Google Scholar 

  115. Morais, C. L. M. & Lima, K. M. G. Comparing unfolded and two-dimensional discriminant analysis and support vector machines for classification of EEM data. Chemometr. Intell. Lab. Syst. 170, 1–12 (2017).

    CAS  Google Scholar 

  116. Siqueira, L. F. S., Araújo Júnior, R. F., de Araújo, A. A., Morais, C. L. M. & Lima, K. M. G. LDA vs. QDA for FT-MIR prostate cancer tissue classification. Chemometr. Intell. Lab. Syst. 162, 123–129 (2017).

    CAS  Google Scholar 

  117. Warrens, M. J. Cohen’s kappa is a weighted average. Stat. Methodol 8, 473–484 (2011).

    Google Scholar 

  118. Morais, C. L. M., Lima, K. M. G. & Martin, F. L. Uncertainty estimation and misclassification probability for classification models based on discriminant analysis and support vector machines. Anal. Chim. Acta 1063, 40–46 (2019).

    CAS  PubMed  Google Scholar 

  119. Rocha, W. F. C. & Sheen, D. A. Classification of biodegradable materials using QSAR modelling with uncertainty estimation. SAR QSAR Environ. Res. 27, 799–811 (2016).

    CAS  PubMed  Google Scholar 

  120. de Almeida, M. R., Correa, D. N., Rocha, W. F. C., Scafi, F. J. O. & Poppi, R. J. Discrimination between authentic and counterfeit banknotes using Raman spectroscopy and PLS-DA with uncertainty estimation. Microchem. J. 109, 170–177 (2013).

    Google Scholar 

  121. Allegrini, F. & Olivieri, A. C. Sensitivity, prediction uncertainty, and detection limit for artificial neural network calibrations. Anal. Chem. 88, 7807–7812 (2016).

    CAS  PubMed  Google Scholar 

  122. Trevisan, J. et al. Syrian hamster embryo (SHE) assay (pH 6.7) coupled with infrared spectroscopy and chemometrics towards toxicological assessment. Analyst 135, 3266–3272 (2010).

    CAS  PubMed  Google Scholar 

  123. Paraskevaidi, M. et al. Raman spectroscopic techniques to detect ovarian cancer biomarkers in blood plasma. Talanta 189, 281–288 (2018).

    CAS  PubMed  Google Scholar 

  124. Trevisan, J. et al. IRootLab: a free and open-source MATLAB toolbox for vibrational biospectroscopy data analysis. Bioinformatics 29, 1095–1097 (2013).

    CAS  PubMed  Google Scholar 

  125. Ballabio, D. & Consonni, V. Classification tools in chemistry. Part 1: linear models. PLS-DA. Anal. Methods 5, 3790–3798 (2013).

    CAS  Google Scholar 

Download references

Acknowledgements

C.L.M.M. thanks Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)–Brazil (grant 88881.128982/2016-01) for financial support.

Author information

Authors and Affiliations

Authors

Contributions

F.L.M. is the principal investigator who conceived and developed the idea for the article. C.L.M.M. performed the data analysis and wrote the manuscript. K.M.G.L and M.S. contributed with recommendations and provided feedback and changes to the manuscript. C.L.M.M. and F.L.M. brought together the text and finalized the manuscript.

Corresponding authors

Correspondence to Camilo L. M. Morais or Francis L. Martin.

Ethics declarations

Competing interests

Both F.L.M. and C.L.M.M. are shareholders in Biocel UK Ltd., a company for which M.S. is Director. Since submission of this article, Biocel UK Ltd. has sought to develop data analytic tools as a service for commercial gain; some of these might be based on methodologies demonstrated in this manuscript. F.L.M. is also seeking a research and development position within Biocel UK Ltd.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this tutorial

Lilo, T. et al. Anal. Bioanal. Chem. 412, 1077–1086 (2020): https://doi.org/10.1007/s00216-019-02332-w

Maitra, I. et al. J. Biophotonics 13, e201960132 (2019): https://doi.org/10.1002/jbio.201960132

Maitra, I. et al. Analyst 144, 7447–7456 (2019): https://doi.org/10.1039/C9AN01749F

Supplementary information

Supplementary Information

Supplementary Figures 1–9 and Supplementary Method.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Morais, C.L.M., Lima, K.M.G., Singh, M. et al. Tutorial: multivariate classification for vibrational spectroscopy in biological samples. Nat Protoc 15, 2143–2162 (2020). https://doi.org/10.1038/s41596-020-0322-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-020-0322-8

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing