FTIR differentiation based on genomic DNA for species identification of Shigella isolates from stool samples

Shigellosis is one of the major public health concerns in developing and low-income countries caused by four species of Shigella. There is an apparent need to develop rapid, cost-effective, sensitive and specific methods for differentiation of Shigella species to be used in outbreaks and health surveillance systems. We developed a sensitive and specific Fourier-transform infrared spectroscopy (FTIR) based method followed by principal component analysis (PCA) and hierarchical clustering analysis (HCA) assays to differentiate four species of Shigella isolates from stool samples. The FTIR based method was evaluated by differentiation of 91 Shigella species from each other in clinical samples using both gold standards (culture-based and agglutination methods) and developed FTIR assay; eventually, the sensitivity and specificity of the developed method were calculated. In summary, four distinct FTIR spectra associated with four species of Shigella were obtained with wide variations in three definite regions, including 1800–1550 cm−1, 1550–1100 cm−1, and 1100–800 cm−1 distinguish these species from each other. In this study, we found the FTIR method followed by PCA analysis with specificity, sensitivity, differentiation error and correct differentiation rate values of 100, 100, 0 and 100%, respectively, for identification and differentiation of all species of the Shigella in stool samples.

www.nature.com/scientificreports/ methods also fall into two main groups with those associated with gene amplification, including all PCR based assays and non-amplification DNA fingerprinting methods, namely FTIR and Raman spectroscopy techniques 9 . So far, limited assays have been designed and developed to identify and discriminate different species of Shigella isolates from each other 10 . Some molecular methods, including MALDI-TOF MS 11 , conventional multiplex PCR 5 , LC-MS 12 , immunocapture PCR 13 , and NGS 10 techniques, have been developed and used to differentiate four species Shigella isolates from clinical, food and environmental samples. It has also been shown that some standard molecular techniques such as MALDI-TOF MS and 16-rRNA gene sequencing were unable to differentiate Shigella species from each other as well as from Enteroinvasive E. coli strains 14 . FTIR spectroscopy is a relatively cost-effective, rapid, convenient, and precise analytical technique that can reflect the DNA structure and composition 15 . FTIR is a method used to obtain an infrared spectrum of emission or absorption of a liquid, solid or gas. An FTIR spectrometer collects the high-resolution spectral data simultaneously over a wide range of spectral data. This technique identifies chemical bonds in different molecules by producing an infrared absorptions spectrum. FTIR spectroscopy has been used as a powerful tool for species identification and differentiation of eukaryotic and prokaryotic cells based on genomic DNA characterization and barcoding 16 . Spectral data from FTIR are so complicated for analysis; therefore multivariate statistical and dimension reduction methods such as principal components analysis (PCA), hierarchical clustering analysis (HCA), partial least squares (PLS) and artificial neural networks (ANN) techniques have been used for interpretation of the results 17 . Several researchers used and suggested FTIR assay followed by statistical analysis (mostly PCA and HCA assays) as a rapid, simple, relatively cheap, precise, sensitive, specific and convenient method for distinguishing the species of microbial pathogens, isolated from clinical specimens based on their genomic DNA structural differences 18 . So far, this method has not been employed to differentiate Shigella species isolated from clinical samples. Regarding the fact that shigellosis and Shigella species are now responsible for more than fifty-thousand deaths annually among children around the world and it is also highly critical and needed to costeffectively and rapidly differentiate four species of Shigella isolates for investigation of shigellosis outbreaks 2 , the purpose of this study was to design and develop the FTIR assay followed by statistical analysis to identify and differentiate four species of Shigella isolates from stool samples.

Results
FTIR spectral data. In this study, we developed an FTIR spectroscopic assay as a DNA barcoding method to differentiate four species of Shigella from each other. 91 Shigella isolates were collected from 1862 stool specimens including 18, 25, 23 and 25, S. dysenteriae, S. flexneri, S. boydii and S. sonnei isolates, respectively; using the culture and biochemical based methods and serologic tests as the gold standard methods to detect, identify and serotyping of Shigella species (Table 1). The FTIR spectra of the extracted DNA of the four species of Shigella reference strains, including S. dysenteriae, S. flexneri, S. boydii and S. sonnei were illustrated in Fig. 1. Four distinctly and significantly different FTIR spectra were observed from the DNA of the four species of Shigella strains to differentiate these species from each other. Comparison of the FTIR spectra of DNA from different Shigella species indicated significant variations in spectral characteristics in three definite regions: 1800-1550 cm −1 , 1550-1100 cm −1 and 1100-800 cm −1 . Also, prominent absorption IR marker bands were observed at 1715, 1689, 1622, 1481, 1436, 1320, 1223, 1175, 1058, 955, 866 and 845 cm −1 . Consequently, we observed significant variations in all tested regions. FTIR followed by PCA assay. Dimension reduction and multivariate statistical methods, including PCA and HCA assays, were used in this study to further differentiate the FTIR spectra of DNA from different species of Shigella reference strains and isolates collected from stool samples. Before analysis of the FTIR spectra by PCA, Bartlett's test of sphericity and the Kaiser-Meyer-Olkin (KMO) test were carried out. The results indicated that the P-value of Bartlett's test of sphericity was calculated at 0.000 (< 0.001), and the KMO was measured at 0.94, showing that the FTIR spectral data were significantly suitable for PCA analysis. A 3D score plot was generated using the first dominant three principal components (PCs), including PC1, PC2 and PC3, which accounted for 40.24, 34.31 and 7.18% of the total variations, respectively (Fig. 2). As it can be seen in Fig. 2, all four species of Shigella strains and isolates were identified and distinguished from each other successfully, and genomic DNA of different species of Shigella isolates was completely discriminated from each other and did not overlap in the plot. 18 out of 18, 25 out of 25, 23 out of 23 and 25 out of 25 species of S. dysenteriae, S. flexneri, S. boydii and S. sonnei isolates, respectively were correctly identified by using the FTIR method followed by PCA assay in this study showing that PCA was able to differentiate and classify the different species of Shigella strains. The developed method could not distinguish the serotypes of Shigella species from each other in this study.
FTIR followed by HCA assay. The Euclidean distance was used to show the linkage clustering values for hierarchical clustering and calculation of similarity measures among the FTIR spectral data of the genomic DNA of different Shigella species. Figure 3 showed the dendrogram form of the HCA results obtained from FTIR spectral data to identify and differentiate four species of Shigella isolated from stool samples in this study. We considered 50 and 75% similarity cut-offs; consequently, 4 (A1-A4) and 7 (H1-H7) major clades were recognized. In 50% similarity cut-off, S. sonnei and S. boydii isolates were differentiated into two separate clades (A1 and A2); however, S. dysenteriae and S. flexneri isolates were grouped in a single group (A4). On the other hand, in 75% similarity cut-off, S. dysenteriae and S. flexneri isolates were categorized into two distinct groups (H6 and H7), but four different clades (H1-H4) were recognized for S. sonnei and S. boydii isolates. In both 50 and 75% similarity cut-offs, one of the S. boydii isolates was grouped into a separate single clade (A3 or H5). As a result, whereas the HCA assay was developed to analyze the FTIR spectral data of genomic DNA of Shigella species using both 50 and 75% similarity cut-offs in this study, one of the S. boydii isolates (the isolate B6, clades  indicating that HCA also has the potential to be used to identify and classify the different species of Shigella strains. As shown in Table 2, the results of FTIR spectral data using PCA and HCA assays showed that the identification of four species of Shigella isolates from stool samples by FTIR method followed by PCA assay was the best, with specificity, sensitivity, differentiation error and correct differentiation rate values of 100%, 100%, 0% and 100%, respectively for all species of Shigella isolates. The specificity, sensitivity, differentiation error and correct differentiation rate values of FTIR analysis with HCA assay were 100%, 95.65%, 1.09% and 98.9%, respectively, for identification and differentiation of S. boydii isolates. However, the FTIR coupled with the HCA assay developed in this study was not capable of differentiating Shigella serotypes from each other.

Discussion
Shigella still remains a main cause of mortality, morbidity and one of the most important communicable pathogens causing diarrhoea among infants and young children around the world 19 . Shigellosis, caused by different Shigella species, usually occurs as occasional outbreaks and sporadic cases in some developed and industrial countries and leads to sporadic and epidemic gastrointestinal diseases in developing and low-income countries 20 . Four species of Shigella cause mild to severe diarrhoea in humans. Since shigellosis is highly infectious, it is necessary to develop rapid and cost-effective methods to identify and discriminate different species of Shigella from each other to control and limit the outbreaks effectively 21 . Notably, classical and conventional methods for identifying pathogens in clinical and environmental samples, especially Shigella species, are generally timeconsuming, expensive, and have low specificity and sensitivity 22 . Hence, in this study, we developed the FTIR www.nature.com/scientificreports/ spectroscopy method followed by PCA and HCA assays which offer sensitivity, specificity, speed and costeffectiveness to differentiate the genomic DNA of Shigella species isolated from stool samples and address this problem. However, FTIR based methods require specific and expensive equipment and are relatively challenging to perform 23 for differentiation of Shigella species. The spectral data of genomic DNA of Shigella species were recorded in the regions between 400 and 4000 cm −1 and considered in the range of 1800-800 cm −1 (a span of 520 wavenumbers) which was previously recommended for characterization of DNA aqueous solutions for analysis by PCA and HCA assays 16 . Three distinct regions   17 . The bands at 1715 cm −1 and 1689 cm −1 were assigned to the stretching vibration of guanines involved in triple helical structures and the stretch of paired guanines, respectively. The strong bands at 1622 cm −1 were due to C=C and C=N ring vibrations of adenine base 24 . Vibrations localized to sugar-based interactions and giving rise to the marker bands sensitive to backbone conformation, glycosidic bond rotation and sugar puckering modes were observed at the region of 1500-1250 cm −125 . The prominent absorption bands at 1481 cm −1 , 1436 cm −1 and 1320 cm −1 were due to the adenine ring vibrations, adenine in the Z-form helices and guanine vibration in S-type sugar formation, respectively 24 . The last region between 800 cm −1 and 1250 cm −1 were due to vibrations along the sugar-phosphate chain and sensitivity to the conformation of the nucleic acid backbone. The B-form double helix of DNA appears at 1223 cm −1 . The bands at 1175 cm −1 and 1058 cm −1 were also assigned to vibration of a sugar-phosphate backbone and backbone vibration contributing from the C-O stretch, respectively 26 . Other prominent absorption bands in the region of 1000-800 cm −1 , including 955 cm −1 , 866 cm −1 and 845 cm −1 , were also due to the different nucleic acid N-and S-type of sugar puckering and the sugar-phosphate backbone vibrations 24 . PCA and HCA assays were used to analyze the FTIR spectral data obtained from the genomic DNA of Shigella isolated from stool samples to identify and discriminate the species from each other in this study. KMO measure and the P-value of sphericity Bartlett`s test were observed suitable for PCA analysis 27 . PCA model for FTIR spectral data analysis indicated that all isolates species could be discriminated from each other correctly by using the PCA analysis. Dendrogram obtained from HCA analysis of the FTIR spectral data of the DNA of  www.nature.com/scientificreports/ Shigella species also showed that FTIR method followed by HCA assay could be used to differentiate all species of Shigella from each other. However, regarding the sensitivity and specificity analysis of the assays, we found that PCA assay was more sensitive and precise for analysis of FTIR spectral data of Shigella isolates. Consequently, it was demonstrated that S. dysenteriae, S. flexneri, S. boydii and S. sonnei isolated from stool samples could be well distinguished by the established FTIR-PCA protocol. FTIR method followed by multivariate statistical or dimension reduction methods has been used to discriminate the genomic DNA of the species of different eucaryotic and procaryotic cells from each other in several researches successfully. Demir et al. successfully differentiated 12 species of wild wheat from each other using attenuated total reflection FTIR followed by HCA and PCA assays. They reported both analysis methods useful and generally found FTIR method sensitive, low cost and rapid to discriminate the species of eucaryotic cells 28 . Dinkelacker et al. identified three different species of Klebsiella isolated from clinical samples using FTIR, MALDI-TOF and NGS methods. They showed that all of these methods could discriminate Klebsiella species from each other; however, FTIR showed higher discriminatory power, specificity and sensitivity to recognize three species of Klebsiella isolates 29 . Han et al. also used the FTIR method followed by PCA and PLS assays for species-specific analysis of the genomic DNA of meat and bone meals. They evaluated these methods to determine the source of 51 meat and bone meal samples. They developed a two-step protocol for distinguishing analysis and found the established method completely (100%) sensitive and specific 30 . Another study conducted by Potocki et al. indicated that Raman spectra-based and FTIR spectra-based molecular fingerprinting methods can be used effectively with high specificity and sensitivity to identify different species of the clinical Candida isolates 31 . FTIR spectroscopy is used for analytical chemistry experiments 17 . Recently, this method has been used to characterize biological substances, such as nucleic acids, to rapidly differentiate and identify different organisms in agricultural and medical sciences 32 . We have shown for the first time that FTIR-based DNA fingerprinting reflected the genomic diversity of 91 Shigella isolates collected from stool samples to differentiate the four species of this pathogen correctly. However, using FTIR spectroscopic method for Shigella species differentiation may be limited since implementing this method requires relatively expensive and specific equipment 29 . Regarding the fact that limited studies have investigated and designed practical assays for the differentiation of Shigella species, biosensor methods have been developed for this purpose 6,10,17,18 . Even though it was noted that these assays are expensive and complicated to implement. It is worth noting that, compared with other methods such as MALDI-TOF and agglutination assays, using FTIR and chemometrics method for bacterial identification and differentiation is highly practical and strongly useful in labs in which labs FTIR device exist and it currently is used for chemical analysis 7,12,16,27,29 . It is strongly emphasized that there is still a considerable need for developing a rapid, cost-effective, sensitive and specific methods requiring simple equipment to discriminate Shigella species in food, water and clinical samples.

Conclusions
In conclusion, 91 Shigella strains, including 18 S. dysenteriae, 25 S. flexneri, 23 S. boydii and 25 S. sonnei were isolated from 1862 stool samples collected from patients with acute diarrhoea. FTIR method followed by PCA and HCA assays were used to analyze the DNA extracted from all Shigella isolates and reference strains. Four distinct and significantly different FTIR spectra reflecting four species of Shigella were obtained with significant variations in three definite regions, including 1800-1550 cm −1 , 1550-1100 cm −1 and 1100-800 cm −1 , to discriminate these species from each other. We found the FTIR method followed by PCA assay the best, with specificity, sensitivity, differentiation error and correct differentiation rate values of 100%, 100%, 0% and 100%, respectively, for differentiation of all species of the Shigella isolates from stool samples. Compared to other molecular techniques, our developed assay is more rapid, relatively cost-effective and convenient. Isolation, identification and serotyping of Shigella species from stool samples. Shigella species were isolated from stool samples and differentiated according to the conventional methods as gold standards previously described by Mokhtari et al. 33 and Phiri et al. 34 Sterilized disposable inoculation loop of stool samples were directly inoculated on xylose lysine deoxycholate agar (XLD, Merck, Germany) and incubated at 37 °C for 24 h aerobically. Suspected colonies, including red ones on XLD agar morphologically resembling Shigella were isolated and subjected to biochemical tests. Lysine iron decarboxylase (LIA, Merck, Germany), triple sugar iron (TSI, Merck, Germany), IMViC (Indole, Methyl red, Voges-Proskauer and Citrate tests, Oxoid Ltd., UK), and urease production (Merck, Germany) tests were used to confirm the suspected colonies and identify Shigella in stool samples. Genus and species of each presumptive Shigella isolate were serologically determined and identified by slide agglutination assay using the commercial Shigella genus and species antisera kits (Difco Co., MI, USA), respectively. All Shigella isolates were serotyped by serological slide agglutination method using the Shigella serotyping polyvalent antisera kit (Denka Seiken, Japan) according to the manufacturers' instructions.  35 . Twenty-five types of the analysis results were selected and analyzed to evaluate the significant distribution with the infrared intensities and wavenumbers of the Shigella isolates spectral data. Differentiation rate of the developed method in this study was assessed according to specificity, sensitivity, differentiation error and correct differentiation rate of the assay, which were calculated as follows:

Methods
where the true positive (TP) indicates the number of Shigella species identified by the developed method in this study and detected in the samples by the gold standard method, the false positive (FP) indicates the number of Shigella species which are not detected by the gold standard but identified in the samples by the developed method, the true negative (TN) indicates the number of Shigella species not identified and detected by the developed and gold standard methods, respectively in the samples and the false negative (FN) indicates the number of Shigella species which are detected by the gold standard but not identified in the samples by using the developed method in this study 30 . All measurements were carried out in triplicates.

Ethics approval
The Ethics Committee approved the sampling and study protocols of the College of Veterinary Medicine, University of Tehran (IR.UT.REC.1394.108). At the present study, all research was performed in accordance with relevant guidelines/regulations and the Declaration of Helsinki. Also, for all cases, informed consent was obtained from the patients whose stool specimen was included in this study. License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.