Discovery and validation of an NMR-based metabolomic profile in urine as TB biomarker

Despite efforts to improve tuberculosis (TB) detection, limitations in access, quality and timeliness of diagnostic services in low- and middle-income countries are challenging for current TB diagnostics. This study aimed to identify and characterise a metabolic profile of TB in urine by high-field nuclear magnetic resonance (NMR) spectrometry and assess whether the TB metabolic profile is also detected by a low-field benchtop NMR spectrometer. We included 189 patients with tuberculosis, 42 patients with pneumococcal pneumonia, 61 individuals infected with latent tuberculosis and 40 uninfected individuals. We acquired the urine spectra from high and low-field NMR. We characterised a TB metabolic fingerprint from the Principal Component Analysis. We developed a classification model from the Partial Least Squares-Discriminant Analysis and evaluated its performance. We identified a metabolic fingerprint of 31 chemical shift regions assigned to eight metabolites (aminoadipic acid, citrate, creatine, creatinine, glucose, mannitol, phenylalanine, and hippurate). The model developed using low-field NMR urine spectra correctly classified 87.32%, 85.21% and 100% of the TB patients compared to pneumococcal pneumonia patients, LTBI and uninfected individuals, respectively. The model validation correctly classified 84.10% of the TB patients. We have identified and characterised a metabolic profile of TB in urine from a high-field NMR spectrometer and have also detected it using a low-field benchtop NMR spectrometer. The models developed from the metabolic profile of TB identified by both NMR technologies were able to discriminate TB patients from the rest of the study groups and the results were not influenced by anti-TB treatment or TB location. This provides a new approach in the search for possible biomarkers for the diagnosis of TB.


Results
Study population. Three hundred and thirty-two participants were included in this study and classified into the following study groups: 189 active TB patients, 42 pneumococcal pneumonia patients, 61 LTBI individuals, and 40 uninfected individuals. Demographics of the study population are shown in Table 1. Of the 332 patients, 64.8% were men of an average of 46 years old (± 17.3). Regarding study groups, most TB (untreated and under treatment) and pneumococcal pneumonia patients and individuals with LTBI were men (79.6%, 70.7%, 54.1%, and 81.0%, respectively), while 75.0% of the uninfected individuals were women. Patients with pneumococcal pneumonia were older than the rest of the study groups (49.6% older, p < 0.001). From the patients with active TB, 49 were enrolled before starting anti-TB treatment and 140 during TB treatment. Patients undergoing TB treatment averaged 39.1 (SD ± 68.9) days of treatment. Most patients (99.3%) had strains sensitive to TB treatment except one patient with rifampicin-resistant strains ( Table 2). Among patients with TB, 78.8% had pulmonary TB, 13.8% had extrapulmonary TB (lymph nodal, pleural, peritoneal, osteoarticular, meningeal and miliary TB), and 7.4% had disseminated TB (Table 2).
Of the 332 urine samples obtained from the individual participants, 169 samples were analysed with HF-NMR to identify a metabolic profile that discriminated TB patients from study controls. Then, the remaining 163 samples plus 85 samples previously analysed by HF-NMR were analysed using an LF benchtop NMR spectrometer to detect the TB metabolic profile in a more compact device that can be installed in conventional laboratories and used as a diagnostic tool. The procedure followed to analyse urine samples by NMR is shown in Fig. 1. Table 1. Demographic information of the study population. Categorical variables expressed as a number of subjects (n) and percentage (%), and quantitative variables expressed as median and standard deviation (SD). TB, tuberculosis; PnP, pneumococcal pneumonia; LTBI, latent TB infection; Uninfected, individuals without infection.  www.nature.com/scientificreports/ Characterisation of the TB metabolic fingerprint using HF-NMR. We applied an unsupervised Principal Component Analysis (PCA) to the urine spectra acquired by HF-NMR (19 patients with untreated TB, 27 patients with pneumococcal pneumonia, 17 individuals with LTBI and 29 uninfected individuals) and detected 3 statistical outliers (2 patients with pneumococcal pneumonia and 1 uninfected individual) that were excluded from the analysis 17 . PCA score plots identified differential metabolic clusters between patients with untreated TB (n = 19) and pneumococcal pneumonia patients (n = 25), LTBI individuals (n = 17), and uninfected individuals (n = 28) (Fig. 2). The variability observed was not explained by the different participant recruitment centres (Supplementary Figure S1). The spectral regions responsible for the metabolic differences in each PCA scores plots between TB patients and control groups (pneumococcal pneumonia, LTBI, and uninfected) were identified in PCA loading plots by Hotteling's T2 tests 18 and correlated with a total of 31 chemical shift regions in the first two Principal Components (PCs) (Fig. 3). This urinary spectral fingerprint (corresponding to 31 spectral regions) was assigned to the following eight metabolites: aminoadipic acid, citrate, creatine, creatinine, glucose, mannitol, phenylalanine, and hippurate (Supplementary Figure S2). Metabolites were quantified to show the statistically substantial differences between study groups (Table 3).
HF-NMR-based profile to discriminate TB. We applied a supervised Partial Least Squares-Discriminant Analysis (PLS-DA) to establish predictive models from the identified spectral fingerprint that differentiated untreated TB cases (n = 19) from control groups (25 pneumococcal pneumonia, 17 LTBI, and 28 uninfected). Thus, when comparing untreated TB and pneumococcal pneumonia patients, 100% of TB patients were correctly classified, with 100.0% of sensitivity and specificity. Similarly, when comparing TB patients and LTBI individuals, 94.0% (Standard Deviation, SD = 5.6%) of TB patients were correctly classified, with 89.4% (SD = 6.5%) and Characterisation of the TB metabolic profile using LF-NMR. To detect the metabolic profile of TB in urine previously characterised using HF-NMR, we applied the unsupervised PCA to the urine spectra acquired by LF-NMR (42 patients with untreated TB, 31 patients with pneumococcal pneumonia, 56 individuals with LTBI and 31 uninfected individuals). We detected 8 statistical outliers (3 untreated TB, 3 LTBIs, and 2 uninfected), which were excluded from the analysis 17 . Thus, the unsupervised PCA was applied to a total of 39 untreated TB patients, 31 pneumococcal pneumonia patients, 53 LTBI individuals, and 29 uninfected individuals ( Fig. 1). PCA score plots of the LF-NMR urine spectra did not show as clear a discrimination as the HF-data did (Fig. 4). However, although LF-NMR spectroscopy provides a lower resolution than HF-NMR spectroscopy, we identified the spectral fingerprint assigned to the eight metabolites (aminoadipic acid, citrate, creatine, creatinine, glucose, mannitol, phenylalanine, and hippurate) in the metabolic profile, which enabled the differentiation of patients with TB from the controls (Fig. 5). www.nature.com/scientificreports/ www.nature.com/scientificreports/ TB discrimination from LF-NMR-based metabolic profile. PLS-DA was applied to establish predictive models to discriminate untreated TB cases (n = 39) from control groups (31 pneumococcal pneumonia, 53 LTBI, and 29 uninfected) from the LF-NMR urinary spectral fingerprint identified. When comparing TB and pneumococcal pneumonia patients, 87.3% (SD = 7.8%) of TB patients were correctly classified, with 94.4% (SD = 3.6%) and 85.62% (SD = 5.5%) sensitivity and specificity, respectively. Similarly, when comparing TB patients and LTBI individuals, 85.2% (SD = 5.8%) of TB patients were correctly classified, with 91.9% (SD = 4.8%) and 90.2% (SD = 3.5%) sensitivity and specificity, respectively. Finally, when comparing TB and uninfected individuals, 100% of TB patients were correctly classified, with 100% of sensitivity and specificity. The predictive model established between untreated TB patients and uninfected individuals was applied to classify the LF-NMR urine spectra of the 88 TB patients under treatment. The model correctly classified 84.1% of TB patients that had not been used to create the predictive model (all of them under treatment) in the TB group. Among these TB patients, 68 had pulmonary TB, 12 had extrapulmonary TB, and 8 had disseminated TB. This model also correctly classified 100% of extrapulmonary TB patients in the TB group.

Discussion
In recent years a lot of effort has been made to identify the highest priority needs in order to improve the diagnostic procedures for TB 1,19 . The need to develop accurate and more accessible diagnostic methods in primary healthcare centres has meant an intensification of the search for biomarkers from non-sputum-based biological samples 20 .
In this study, we have identified and characterised a metabolic profile of TB in urine from a high-field NMR spectrometer and detected the same profile with a low-field NMR spectrometer. The models developed from the metabolic profile of TB identified by both NMR technologies showed the potential to discriminate between TB patients, pneumococcal pneumonia patients, individuals with LTBI and uninfected individuals. In addition, the results of the models developed from this metabolic fingerprinting were not influenced by anti-TB treatment or TB location.  www.nature.com/scientificreports/ When applying the predictive model to the HF and LF-NMR spectra of the TB patients under treatment, 90.9% and 84.1% of the patients in the TB group were correctly classified. Furthermore, the models developed from the HF and LF-NMR-based spectral fingerprints correctly classified TB patients and pneumococcal pneumonia patients with a success rate of 100% and 87.3%, respectively. Therefore, the use of the model developed here could facilitate the identification of patients with TB and rule out those with other respiratory infections, such as pneumococcal pneumonia.
Using the HF-NMR-based spectral fingerprint, our model correctly classified 94.0% of TB patients compared to LTBI individuals, and 85.2% of them when using the LF-NMR-based urine metabolic fingerprint. This may be useful in situations where it is necessary to distinguish between TB and LTBI, such as in children, where early detection of TB is crucial to avoid severe forms of the disease developing. It is well known that neither TST nor IGRAs can distinguish between LTBI and active cases 21,22 . In our experience 23 , a model based on the combination of IFN-γ, IP-10, ferritin and 25-hydroxyvitamin D could improve the detection of patients with subclinical TB. The metabolomic approach we described may be useful in detecting the early stages of the disease. Regarding metabolomics, a multi-site study across Sub-Saharan Africa provided a trans-African metabolic biosignature in serum and plasma to predict the onset of TB before active TB manifestation 24 .
The metabolomic approach we present showed a classification performance within the minimum standards required by the World Health Organization to develop a non-sputum based biomarker test 19 . Although Xpert for detecting TB is still not as sensitive as culture, its rapidity in diagnosing TB has made shortening the delay between diagnosis and early treatment possible 25 . Also considering speed, the ability of the LF benchtop NMR spectrometer to measure samples quickly and easily would enable the integration of the final decision on the diagnosis into the same visit.
In addition, the use of an easily accessible sample would allow for successful implementation in microscopy centres, health posts and primary-care clinics 26 . A urine-based approach would allow the diagnosis of TB in patients who had difficulty in obtaining a representative sample from the site of infection (patients with extrapulmonary TB or not able to produce sputum), a relevant issue especially in LMICs since these patients often present non-specific symptoms of the disease 1 . The Alere Determine TB LAM Ag assay (AlereLAM, Abbott, Chicago, US) was introduced for TB diagnosis in HIV-positive patients by detecting the presence of lipoarabinomannan (LAM) in urine samples 27 . Subsequently, a new generation of urine LAM assays, Fujifilm SILVAMP TB LAM (FujiLAM) assay (Fujifilm, Tokyo, Japan) has been developed 28 , representing improved diagnostics for HIVpositive TB patients. In this regard, the metabolic profile identified in this study might provide a promising tool www.nature.com/scientificreports/ for diagnosing TB from urine samples. In addition, the quantification of the statistically significant metabolites identified in this TB metabolic profile would allow this technology to be adapted to a point-of-care test. Although more studies should be conducted in larger TB cohorts, vulnerable populations (children, HIV, comorbidities), and in other geographic regions to validate the performance of the predictive model based on the metabolic profile identified by LF benchtop NMR technology, these first results are promising. If we consider the target product profiles (TPPs) published by the WHO and partners 26 , although the diagnostic sensitivity of this metabolic model did not reach that achieved with the Xpert (diagnostic sensitivity of > 95% in comparison to culture), the sensitivity it achieved was within the minimum requirements established by the TPPs to develop a rapid non-sputum-based biomarker test for pulmonary TB in adults. Furthermore, with 100% of extrapulmonary TB patients correctly classified into the TB group, this model would fall within the optimal requirements recommended by the published TPPs for developing a rapid non-sputum-based biomarker test for extrapulmonary TB in adults 19 . Therefore, this metabolic approach based on LF-NMR could be a promising candidate for detecting all TB types. In addition, its small size and ease of maintenance would allow the implementation of the technology in primary healthcare centres as an alternative, or complementary to the diagnostic tests already available, without the risk of a shortage of cartridges. This robust benchtop NMR technology, straightforward sample preparation and minimal operational requirements would lead to a better diagnostic performance in LMICs; thus, increasing the number of cases diagnosed and allowing prompt treatment, which would reduce the transmission and mortality burden of TB.
TB is known to be a wasting disease involving weight loss, malnutrition and metabolic disorders 29 . The nutritional source of the bacteria is an essential aspect of host-pathogen interaction 24 . M. tuberculosis has adapted its metabolism to use different nutrient sources and compounds such as carbon or nitrogen sources to promote bacterial growth 30 . The metabolic profile presented in this study is based on the combination of eight metabolites, which is able to distinguish TB patients from those with pneumococcal pneumonia, LTBI, and non-infection. Previous studies have been conducted on serum samples to identify potential biomarkers for TB diagnosis using MS [31][32][33] and NMR 34 . Others have identified TB diagnostic markers from plasma samples in adults 35,36 and children 37 . In this study, TB patients showed reduced concentrations of creatine and phenylalanine compared to LTBI patients. Both metabolites are involved in the metabolism of necessary amino acids and derivatives. During infection in macrophages, M. tuberculosis shows a preference for amino acids as a source of nitrogen 38 . In line with this, TB patients showed high concentrations of creatinine compared to pneumococcal pneumonia patients. Creatinine is a breakdown product of creatine, so it might have a role in the synthesis of nitrogen-containing molecules. M. tuberculosis can co-metabolise multiple amino acids simultaneously 30 . In this metabolic process urea and other amino groups are synthesised from the breakdown of the amino acids. Thus, the low concentration of urine hippurate observed in TB patients might be due to/connected to the synthesis of aromatic amino acids such as tryptophan, tyrosine and phenylalanine 39,40 . Alternatively, Weiner et al. suggested that low serum hippurate concentrations might be related to uremic cytotoxic activity related to vitamin D metabolism 24 . In this study, TB patients showed low glucose and citrate concentrations compared with LTBI patients and uninfected individuals. The involvement of these two metabolites in the oxidative metabolism of carbohydrates, proteins and fats suggests that there is an increase in energy consumption by TB patients 41 . In accordance with this, Ehrt et al. reported the adaptation of M. tuberculosis to co-catabolise multiple carbon substrates, so that it grows faster and more extensively on carbon source mixtures than it does on any single source 30 . This might also explain the high concentrations of mannitol found in TB patients compared with non-infected individuals. However, in this study, we found that TB patients showed higher concentrations of citrate and lower concentrations of intermediary compounds of amino acid metabolism compared to pneumococcal pneumonia patients. This could be explained because the Krebs cycle is most likely not used in the S. pneumoniae metabolism due to the lack of genes encoding the enzymes involved in this pathway. In contrast, the lack of a complete Krebs cycle for S. pneumoniae has a great impact on amino acid synthesis 42 . Ultimately, the metabolites identified in the TB metabolic profile of our study are involved in pathways such as the oxidative metabolism of carbohydrates, proteins and fats, or the metabolism of the necessary amino acids and derivatives 41 .
When considering urine metabolomics, a study conducted in Uganda identified a urine-based metabolite biosignature with potential to monitor the response of individuals to anti-TB therapy 43 . In contrast, another study conducted in South Africa characterised the biological basis of a poor treatment outcome using urine samples collected at diagnosis, during treatment, and after treatment completion 44 . In a study of children and adults with TB, LTBI or uninfected, urine neopterin levels were measured using an enzyme-linked immunosorbent assay predicting that the neopterin/creatinine ratio may be a considerable predictor of disease progression 45 . According to our results, our models can detect TB even in patients who have initiated anti-TB treatment. In recent years, new markers have been identified in urine in response to TB treatment 43 . Future metabolomic studies should be conducted to monitor treatment and evaluate treatment or cure success, and also identify possible re-infections. In addition, host metabolites derived from anti-TB treatment identified in urine could be useful for monitoring treatment and improving patient adherence to TB treatment.
Limitations of this study include the relatively small sample size of TB patients before starting anti-TB treatment. Additionally, not all urine samples could be tested by HF and LF-NMR due to the time lapse between the analysis of the samples by the two NMR spectrometers. Therefore, some patients with samples tested for HF were not tested for LF-NMR and vice versa. Although this did not affect the results of the study, future research should test the discriminatory potential of the identified TB spectral fingerprint in a consecutive series of patients in a country with a high incidence of TB and co-infections. This proof of concept represents a first step towards the development of an affordable metabolomic test for the diagnosis of TB.
In this study, we have identified and characterised a metabolic profile for TB in urine with potential to discriminate TB patients from the rest of the study groups; this metabolic profile for TB has also been detected using an LF benchtop NMR spectrometer. www.nature.com/scientificreports/ in microscopy centres, health posts and primary care clinics, improving access to TB diagnosis. In addition, the ability of the model developed through this urine-based NMR technology to detect extrapulmonary TB and TB in patients under treatment is a step forward for the search for new diagnostics for TB that are not sputum-based and that can detect all forms of TB. In summary, the identification of a metabolic profile for TB in urine from NMR with the potential to discriminate TB patients from pneumococcal pneumonia patients, individuals with LTBI, and uninfected individuals highlights the application of metabolomics as a new approach in the search of non-sputum-based new potential biomarkers for TB diagnosis. Participants were classified into four study groups: patients diagnosed with active TB, patients with pneumococcal pneumonia, individuals with LTBI, and uninfected individuals.

Methods
All active TB patients had clinical and radiological signs compatible with TB and were microbiologically confirmed by culture and/or Xpert MTB/RIF.
Patients with pneumococcal pneumonia were diagnosed by isolating the bacteria in blood culture and/or detecting the pneumococcal urinary antigen. Patients with pneumococcal pneumonia were collected before starting the antibiotic treatment.
LTBI individuals were recruited from contact tracing studies with a positive TST and/or IGRA but without clinical and radiological signs consistent with active TB. To perform the TST, we used a 2-TU dosage of PPD-RT (Statens Serum Institut, Copenhagen, Denmark). The performance and interpretation of the results of the Mantoux test were carried out following the Spanish guidelines 46 . IGRAs testing was performed using the commercially available enzyme-linked immunosorbent assay (ELISA) QuantiFERON-TB Gold In-Tube test (QFT, Qiagen, Hilden, Germany) and/or the Enzyme-Linked Immunospot (ELISPOT) assay T-SPOT.TB blood test (T-SPOT.TB; Oxford Immunotec Ltd, Oxford, UK) following the manufacturer's protocol. The LTBI patients were included before starting chemoprophylaxis.
The uninfected individuals were volunteers with no evidence of M. tuberculosis infection and with negative TST and IGRA tests.
Collection and preparation of urine samples for NMR analysis. We collected midstream urine samples from all participants of the study in sterile, universal plastic containers following standardised procedures 13 . Urine samples were aliquoted into 2 ml cryovials with screw caps and frozen at − 20 °C until the NMR experiments were performed. Before analysis, urine samples were thawed at room temperature and vortexed 30 s before use. We then aliquoted 400 µl of urine samples into Eppendorf tubes and added 250 µl of 0.2 M phosphate buffer solution containing 0.09% NaN 3 to adjust the internal pH to 7.4. We adjusted the axis of chemical shifts to a signal reference at 0 ppm adding 0.3 mM trimethylsilyl propanoic acid (TSP) in deuterated water dissolution in the preparation of sample. Azide was added during the preparation of the urine samples to avoid bacterial contamination 14 . Buffered urines were vortexed for 30 s and centrifuged at 12,000g for 5 min. Then, we transferred 600 µl aliquot of the supernatant into 5 mm diameter NMR tubes (CortecNet, Les Ulis, France) for proton ( 1 H) NMR acquisition. Figure 1 shows how many of these samples were analysed by HF and LF NMR. NMR spectral acquisition and processing. HF-NMR urine spectra were acquired using a Bruker 700 MHz NMR spectrometer (CNIO, Madrid, Spain) operating at a frequency of 697.87 MHz. Shimming and NMR preparation time was reduced to a minimum, while the sample for NMR analysis was chilled to 4 °C to minimise metabolic changes. The acquisition of the spectra was performed in accordance with the standardised protocols previously described 47 . A number of bidimensional homonuclear and heteronuclear experiments such as standard gradient-enhanced correlation spectroscopy (COSY), 1 H-1 H total correlated spectroscopy (TOCSY), and gradient-selected heteronuclear single quantum correlation (HSQC) protocols were performed to carry out component assignments. Between consecutive two-dimensional (2D) spectra, a control 1 H NMR spectrum was always measured. No gross degradation was noted in the signals of multiple spectra acquired under the same conditions. Standard solvent-suppressed spectra were grouped into 32,000 data points, averaged over 256 acquisitions. The data acquisition lasted a total of 13 min using a sequence based on the first increment of the nuclear Overhauser effect spectroscopy (NOESY) pulse sequence to effect suppression of the water signal (δ = ~ 4.80 ppm). Sample acquisitions were performed using a spectral width of 8333.33 Hz prior to Fourier transformation, and the free induction decay (FID) signals were multiplied by an exponential weight function corresponding to a line broadening of 0.3 Hz.
LF-NMR urine spectra were acquired using a Magritek Spinsolve 60 Ultra Benchtop spectrometer (Magritek GmbH, Aachen, Germany) at a frequency of 60 MHz using a one-dimensional presaturation (1D PRESAT) www.nature.com/scientificreports/ sequence to allow for efficient saturation of the water signal (δ = ~ 4.95 ppm) following the previously described procedures 15 . NMR data were processed and editing using MestReNova software (v.14; Mestrelab Research, Santiago de Compostela, Spain) according to the established protocols described in a previous study 47 . Metabolite signals of the spectra were shift-aligned using trimethylsilyl propanoic acid (TSP) as a reference signal standard (δ = 0.00 ppm). From the raw NMR spectra, the chemical shift region from 5.00 to 5.20 ppm was excluded from the analysis to remove the random effects of variation in the urine and water resonance suppression (δ = 6.50 to 4.22 ppm). Similarly, the chemical shift region from 0 to 0.04 ppm containing the internal reference (TSP) was excluded from the statistical analyses. Baseline correction was performed automatically using the 'Withakker Smoother' algorithm. Binning (also known as bucketing) was applied to NMR spectra and data-reduced to equal length integral segments (bins) of δ = 0.04 ppm to compensate variations in resonance positions. All bins were normalized by the total sum of the spectral regions (each bin was divided by the sum of all the NMR signals). Thus, the concentration of each metabolite was normalized by the urine concentration to compare these concentrations (in arbitrary units) between samples. Relative intensity was calculated as the original intensity normalized by total sum of the spectral regions to compensate urine concentration and to ensure that all observations were directly comparable.

Data analysis. Statistical analysis of the study population.
A descriptive analysis of the subjects who participated in the study was performed according to the study groups. Frequencies and percentages described the qualitative variables, while the mean and standard deviation described the quantitative variables. For comparisons between study groups, we used the chi-squared test in the case of qualitative variables, and the analysis of variance (ANOVA) in the case of quantitative variables. The level of significance was fixed at 0.05. Analyses were performed using the statistical software IBM SPSS Statistics v.25 (SPSS, Chicago, US).
Statistical analysis of metabolomic data. Data from 1 H NMR spectra were analysed in a multivariate manner using the Metabonomic package of R software (rel.3.3.1) 48 . NMR spectra were data-reduced to equal length integral segments of δ = 0.04 ppm to compensate variations in resonance positions, and they were normalized by total sum of the spectral regions. Prior to multivariate statistical analysis, spectral data were Pareto scaled 49 . Unsupervised (blinded) data were analysed by PCA by the "prcomp" function from the statistical library and supervised (unblinded) analysis was by PLS-DA by the "gpls" function from the "gpls" package allowing separation between no more than two classes of samples.
PCA was applied to represent the variance of all metabolomic variables present in the data-reduced NMR spectra in a low-dimensional space (bins of δ = 0.04 ppm) 50 to identify a differential metabolic pattern of TB to be used as potential biomarkers for TB diagnosis. Thus, all the spectral regions grouped in bins of δ = 0.04 ppm were transformed into a new set of orthogonal variables known as PCs. The first PC was defined by the spectral profile (load) in the data describing most of the variation; the second PC, was the second-best profile describing the variation, and so on, so that the retention of the variation present in the original variables decreased as we went down the order.
The PCs are composed of the scores and loadings. On one hand, the scores hold information about the samples (concentrations). Thus, PCA score plots of the first two or three PCs were used to visually observe the differences between the samples and immediately display sample clustering patterns according to their elemental composition 51,52 . In addition, PCA score plots were used to highlight statistical outliers. We use the Mahalanobis distance to confirm statistical outliers; this consists of calculating the distance from a data point to the centroid of all samples. Mahalanobis distance was calculated for PC1, PC2, and PC3. A single case was considered a statistical outlier if it was placed out of the tolerance ellipse of 97.5% 17 . On the other hand, the loadings hold information about the variables of the data set (chemical changes), indicating the importance of each region in explaining the variance between samples. Therefore, PCA loading plots were used to identify the multiple regions (δ = 0.04 ppm bins) of the 1 H NMR spectra responsible for the separation between groups (the so-called metabolic fingerprint). The spectral regions (potential biomarkers for TB diagnosis) selected from PCA loading plots were confirmed by Hotteling's T2 tests 18 . Those regions outside the 95% tolerance ellipse were identified as the spectral regions responsible for the metabolic differences in the PCA plots. Hotteling's T2 test was applied for each PCA: (1) TB and uninfected groups, (2) TB and pneumococcal pneumonia groups, and (3) TB and LTBI groups.
The identification of the metabolites corresponding to the metabolic fingerprint was performed using the Human Metabolome Database 53 and the characteristic cross-peaks from 2D HSQC spectra. The identified metabolites were individually integrated for metabolic quantification applying the Global Spectral Deconvolution analysis algorithm provided by the MestReNova software and corrected by the multiplicity of the NMR signal. Statistical significance was determined using a Bonferroni corrected Student's t-test assuming significant unequal corrected variance with p < 0.05 54 .
PLS-DA was applied by classifying patients into two groups 51 . We used the algorithm proposed by Ding and Gentleman et al. 55 (tolerance for convergence: 1 × 10 -3 , the maximum number of iterations allowed: 100). The number of PLS components used was chosen by the percentage of variance explained, the R2, and the mean squared error of cross-validation graphics. Thus, PLS-DA was applied to the δ = 0.04 ppm bucketed NMR spectra of the following groups: TB vs. pneumococcal pneumonia, TB vs. LTBI, and TB vs. uninfected. PLS-DA predictive models were performed to assess and validate the diagnostic accuracy of the fingerprints of the metabolites present used to discriminate TB cases from the control groups. Before the comparison between groups (TB vs. pneumococcal pneumonia, TB vs. LTBI, and TB vs. uninfected), samples from each group were divided into two sets: the training set (50%) and the test set (50%). For training purposes, the classification functions derived from the probability of belonging to each group were computed with a number of random testing subjects. These www.nature.com/scientificreports/ classification functions were used afterwards to classify the rest of the subjects for internal validation. This process was repeated 100 times with random permutations of the training and test sets to reduce type I errors 55 . The percentages of correct classification were calculated as a measure of model performance. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.