Diagnosis of tuberculosis (TB) in children remains challenging, and developing better diagnostics is a priority1,2,3. Diagnostic tools based on detecting Mycobacterium tuberculosis (M.tb), including smear microscopy, culture, and Xpert MTB/RIF perform well in adults. However, they fail to diagnose two-thirds of children with suspected TB, due to the paucibacillary nature of paediatric disease2,3,4,5,6. Age, as well as the development of the immune system, further complicates assessment, as the performance of tests for latent TB infection varies with age2,7,8. The non-specific clinical presentation of TB in children presents a further diagnostic challenge3,9. In low-and middle income countries, where the majority of the burden of TB disease lies, approximately 40% of patients are incorrectly diagnosed10. Consequently, many children are not appropriately treated, and over 210,000 children are estimated to die every year11.

Developing novel diagnostics is a key component of the global End TB Strategy, and the goal of zero childhood TB deaths1,12. Host-based biomarkers of TB show promise, with gene expression signatures and flow cytometry techniques potentially capable of distinguishing TB cases from controls, including in children, but published studies are often small and prone to bias13,14,15,16,17. While these data provide confidence in the concept of a host-derived diagnostic signature, both gene expression profiling and flow cytometric measurements currently remain far removed from a point-of-care test. Protein-based diagnostics could be easier to translate into a point-of-care test, although paediatric data is sparse18. A recent systematic review of TB biomarker data published since 2010 in over 400 scientific papers shows the overall activity in the field but contained only 6% of data relating to studies in children19.

Here, we report the application of 1H Nuclear Magnetic Resonance (NMR) spectroscopy and untargeted Ultra-performance Liquid Chromatography-Mass Spectrometry (UPLC-MS) based assays, to identify diagnostic biomarkers of TB disease amongst children with presumptive TB in The Gambia. We validated our findings in a distinct cohort of children with TB from the UK. Novel host biomarkers of paediatric TB were detected and specific metabolites identified, showing promising diagnostic potential on a readily available biofluid.


Discovery cohort patient characteristics

The demographic characteristics, case classification, and number of samples analysed on each analytical platform are shown in Table 1. 22 children had bacteriologically confirmed TB and 33 fulfilled the category of clinically diagnosed TB. There were no significant differences in weight and age between the different case classifications. All children were HIV negative.

Table 1 The mean weight, age, and sex of participants as well as number for each patient diagnosis is given, range in brackets. Data refers to samples analysed using lipidomics.

Combining clinically diagnosed and bacteriologically-confirmed TB participants

The metabolic profiles of the 33 participants with clinically diagnosed TB were compared to the 22 participants with bacteriologically-confirmed TB. Unsupervised (blinded) Principal Component Analyses (PCA) was used to identify the source of the greatest variation in the data, and to establish whether there were obvious groupings in the scores plot of participants (similar profiles are closer together).

Supervised (unblinded) orthogonal partial least squares-discriminant analysis (OPLS-DA) was then used to establish whether there were systematic differences in the metabolic profiles between the two groups. The metabolic profiles of children with clinical TB compared to microbiologically-confirmed TB were indistinguishable, reflected by the OPLS-DA models’ low or negative predictive ability (Q2Y scores) (model values given in Supplementary Table 1, Supplementary statistical analysis methods). Therefore, the clinically diagnosed and bacteriologically-confirmed TB patients were grouped together in further analyses to enable greater analytical power, and are subsequently referred to as the ‘TB disease’ group.

Metabonomic analysis

PCA models were produced using the data acquired from each of the four analytical platforms to identify contributors to variance in the metabolic profiles amongst all participants. The PCA scores plots are shown in Supplementary Fig. 1.

OPLS modelling identified weight as a factor influencing the metabolic profiles obtained (SI, Table 2) and therefore subsequent OPLS-DA models were adjusted for weight.

1H NMR characterisation of the TB phenotype

Figure 1 displays the mean values of a 500 model iteration of the OPLS-DA model comparing the 1H NMR metabolic profiles of the TB disease and other diseases groups. Spectral regions discriminating between other diseases and TB disease samples, and corresponding putative metabolite identifications, are shown in Table 2. The OPLS-DA model values are shown in Table 3, and scores plots in Supplementary Fig. 2a (PCA) and 3a (OPLS-DA).

Figure 1
figure 1

OPLS-DA model based on 1H NMR spectroscopy data, with 500 model iterations, separating Gambian infants at enrolment based on diagnosis, R2Y = 0.78, Q2Y = 0.30, n = 93 (one sample was excluded as there was no information on the participant’s weight, and another was excluded as it was an outlier). (A) The upper panel shows the median 1H spectra of the serum, with peaks which were statistically significantly different between the two groups highlighted. Peaks in red were found in higher concentrations in the TB disease group, while peaks in blue were found in increased concentrations in the other diseases group. This was plotted for easier identification of the peaks and their corresponding metabolites. The lower panel displays a skyline significance plot of significant variables discriminating between the groups. Variables in red above the dashed line are statistically significantly increased in samples from the TB disease group, the strength of the correlation is displayed by the distance from the dashed line, with variables further away being more strongly associated with that group. Variables in blue below the dashed line are the variables increased in the other diseases group (or found in lower concentrations than in the TB disease group). (B) OPLS-DA scores plot, displaying the correlations in the 1H spectra between the participants (the closer the scores are the more similar these participants 1H NMR spectra are to one another). Red squares represent the TB disease group, blue circles represent the other diseases group, and green crosses represent the validation group samples.T orthogonal signal correction, TOSC; T cross validation, TCV.

Table 2 Chemical shifts discriminating between TB and other diseases in Gambian children at enrolment. Peaks without significant variables on either side were ignored.
Table 3 Sensitivity, false positive rate, specificity and false negative rates for each of the analytical platforms employed and 95% CI.

TB disease sera showed increased concentrations of glutamate, N and O-acetyl glycoproteins (GlycA) and phenylalanine, and lower concentrations of alanine, compared to the other diseases group, shown in Fig. 1 and Table 2.

UPLC-MS characterisation of the TB phenotype

To maximize the metabolic phenotype coverage, three complementary chromatographic separations were utilised (lipidomic profiling method with positive and negative electrospray ionization (ESI) - for the detection of complex lipid species (lipidomics ESI+ and lipidomics ESI−, respectively); and hydrophilic interaction liquid chromatography (HILIC) - for the detection of polar molecules).

OPLS-DA models were produced for the UPLC-MS data, comparing the TB disease group against other diseases, to identify systematic distinguishing metabolic variations. Model values are given in Table 3 and Fig. 2, SI. Scores plots for these models are shown in Fig. 3, SI. The OPLS-DA models for all three chromatographic separation methods showed similar predictive abilities.

Lipidomics ESI+ identified ganglioside GM3 (d18:1/16:0), triacylglycerides (16:0/18:1/18:1 and 54:2), and hexose ceramide (d18:1/16:0) as important metabolites distinguishing between the two groups, all of which were increased in the TB disease group relative to other diseases. Lipidomics ESI− showed elevated levels of ceramides (d18:1/16:0, d18:1/20:0, and d18:1/22:0) in the TB disease group. Additional metabolites increased in the TB disease group included lactosylceramide (d18:1/16:0), and HEX-ceramide (d18:1/16:0). Variables identified as distinguishing between the groups are given in Fig. 4, SI, and SI Tables 35.

The discriminant features did not share their molecular mass, retention time, or fragmentation patterns with molecules of mycobacterial origin and were therefore presumably of host rather than pathogen origin.

Analysis of bacteriologically confirmed TB compared with other diseases

To investigate whether comparing only bacteriologically confirmed TB cases against other diseases gave improved predictions of case classification, OPLS-DA models were produced. OPLS-DA model values for this comparison are given in Table 3. These models provide a similar predictive capability as the models including the clinically diagnosed participants in the TB disease group, further justifying our approach to combine the TB disease groups into one.

Diagnostic discrimination

Figure 2 displays the Receiver Operating Characteristic (ROC) curves for the 1H NMR spectroscopy and mass spectrometry data, with sensitivity, specificity and AUC values summarised in Table 3.

Figure 2
figure 2

ROC curves displaying the sensitivity and specificity of the OPLS-DA model from data acquired by (A) 1H NMR spectroscopy, AUC = 0.78 (B) HILIC, AUC = 0.76 (C) Lipidomics ESI−, AUC = 0.78 (D) Lipidomics ESI+, AUC = 0.78. The red line is the line of no-discrimination and the green line gives the slope of the best result.

The metabolic signature for TB disease acquired using 1H NMR data displayed a sensitivity of 69% (95% confidence interval [CI], 56–73%) and a specificity of 83% (95% CI, 73–93%) with an overall AUC of 0.78.

The data acquired using HILIC, lipidomics ESI-, and lipidomics ESI+ displayed respective sensitivities for TB disease of 59% (95% CI, 49–67%) 58% (95% CI, 53–64%) and 67% (95% CI, 60–71%), and specificities of 89% (95% CI, 75–92%), 89% (95% CI, 80–96%) and 86% (95% CI, 75–93%). The AUC for HILIC, lipidomic ESI−, and lipidomic ESI+ data were 0.76, 0.78, and 0.78, respectively.

Table 4 provides sensitivity and specificity values for other diagnostic tests already in routine use for the diagnosis of paediatric TB.

Table 4 Diagnostic performances of routine and potential methods for TB in paediatric populations relative to the metabonomic signatures in this study.

Confusion matrices for 1H NMR, HILIC and lipidomics ESI− mode are shown in SI Tables 69.

UK validation cohort

Data acquired from the Gambian samples were used to predict the TB disease status of a validation cohort of UK samples, which equally included bacteriologically-confirmed and clinically diagnosed TB cases recruited from household contacts 1H NMR spectroscopy correctly classified 6/30 (20%) of the validation cohort. HILIC, lipidomics ESI− and lipidomics ESI+ mass spectrometry data correctly classified 15/35 (43%), 20/36 (56%) and 30/36 (83%) of the UK validation cohort, respectively.

Using the model built from data comparing only bacteriologically confirmed TB against the other diseases groups from the Gambian samples, 1H NMR spectroscopy was able to correctly classify 26/30 (87%) of the validation cohort. The HILIC, lipidomics ESI− and lipidomics ESI+ mass spectrometry data correctly classified 29/35 (83%), 24/36 (68%) and 30/36 (83%) of the UK validation cohort, respectively.


This study from The Gambia and the UK used metabolic profiling assays applied to the serum of children with presumptive TB to detect and validate novel diagnostic biomarkers and alterations in host metabolism due to TB disease. The most discriminatory MS data showed a sensitivity of 67%, specificity of 85% and correctly classified 83% of the validation cohort.

1H NMR spectroscopy has previously been applied to paediatric plasma samples to assess the diagnostic potential in TB20. Sun et al. also recruited a validation cohort to test the model produced obtaining an AUC of 0.795, a sensitivity of 82.4%, and a specificity of 83.9%20, similar to the values we obtained.

Target product profiles have been developed for new TB diagnostics by the WHO21. The WHO recommends the sensitivity of a new diagnostic test for pulmonary TB in children to be equal or above 66% for bacteriologically-confirmed TB (equal to the sensitivity of the Xpert MTB/RIF assay), while recommending a specificity of 98% for childhood TB, compared to a microbiological reference standard21.

The tests described here hence pass the optimal requirements for the sensitivity for both 1NMR spectroscopy and lipidomics ESI+ , but fail for HILIC and lipidomics ESI−. None of the analytical approaches met the WHO requirements for specificity, and therefore would require further development if to be used as a rule-out test. However, within a step-wise screening algorithm, sensitivity would rank higher than the requirement for specificity.

A strength of the current study is the inclusion of symptomatic children with other diseases as a control group, given the real life situation facing health care professionals in the field who have to be able to distinguish between these two entities in the presence of overlapping symptoms. One of the listed characteristics of an ideal paediatric TB biomarker is indeed the ability to discriminate children with TB disease from those infected with other pathogens22.

We also detected novel biomarkers in a readily available, non-sputum based biofluid, particularly important for the diagnosis of paediatric TB, due to the difficulty of obtaining respiratory samples from children. Furthermore, the UK validation cohort confirmed that lipidomics ESI+ can detect TB in children from different environmental and genetic backgrounds, supporting this methodology.

Comparing between the metabonomic approaches, analysis using 1H NMR spectroscopy provides robust and reproducible data while MS has greater sensitivity. The complementary chromatographic methods used allow the detection of different classes of metabolites, including lipids and hydrophilic metabolites, including those of potentially mycobacterial origin. Detecting bacterial lipids in human tissue has been demonstrated previously23, but given the paucibacillary nature of TB in children, it is especially challenging in this context, and we did indeed not identify any compounds of mycobacterial origin.

In this study, phenylalanine was increased in children with TB disease. Phenylalanine levels are known to differ in serum during TB disease, although it has not previously been demonstrated in children. Che et al. observed lower levels of phenylalanine, decreasing 2.73 fold in the TB group24. However, Weiner et al. identified phenylalanine as increasing in relative abundance in their TB group, while a further study by Zhou et al. also identified increased concentrations of phenylalanine in TB disease25, corroborating our results. Similarly, another study analysing urine samples identified dysregulated phenylalanine metabolism in TB patients, possibly due to altered gut microbiota26.

We also identified increased concentrations of glutamate in the serum of children with TB, in line with findings by Zhou et al.25, and Frediani et al.27. TB is able to employ glutamate as an alternative carbon source under hypoxic conditions28, and Frediani et al. hypothesised that the increased concentrations of glutamate observed in TB patients’ blood may result from increased glutamate synthesis by M.tb as a sign of host-pathogen metabolic interactions27.

Another amino acid, alanine, has previously shown to be increased in paediatric TB disease20. However, Zhou et al. identified an opposite relationship, with alanine decreasing25, as we also describe. This may be a consequence of increased amino acid oxidation in comparison to protein anabolism, which contributes to the wasting associated with TB29.

1H NMR spectroscopy showed elevated levels of GlycA in the serum of TB patients in our study. This biomarker is associated which chronic inflammation and long-term risk of severe infection, such as septicaemia and pneumonia30.

We also identified increased concentrations of several ceramides in the serum of children with TB disease, particularly ceramide (d18:1/16:0), as well as several glycosylated species of ceramide (d18:1/16:0). This finding supports previous findings of increased concentrations of ceramide (d18:1/16:0) in adult TB patients in comparison to healthy control patients, and patients with community-acquired pneumonia31. Raised levels of ceramide (d18:1/16:0) have also been reported in patients with TB in comparison to those with lung cancer32.

Ceramides are a type of sphingolipid (sphingosine plus a fatty acid), and are present in high concentrations in cell membranes. Ceramides are considered potent bioactive lipids, involved in multiple cellular signalling pathways33. Ceramide contributes to phagosome maturation in macrophages infected with M.tb, resulting in increased killing of pathogenic mycobacteria34. Furthermore, activation of natural killer T cells by the CD1d ligand α-galactosylceramide has been shown to protect mice against TB. Treatment with α-galactosylceramide has been shown to reduce the bacterial burden in the lungs, while diminishing tissue injury and prolonging survival35. They are also known to contribute to cellular invasion36, apoptosis37, and cell-cell signalling38, all of which relate intimately to microbial pathogenesis.

Similarly, ganglioside GM3 (d18:1/16:0), another sphingolipid, was increased in children with TB disease in this study. Gangliosides are degraded to ceramides by removal of the sugar units in the oligosaccharide head group, and similarly to ceramides, are present in cell membranes, being particularly concentrated in lipid rafts39. The oligosaccharide group found on gangliosides protrude from the cell membrane40, and are involved in cell-cell interactions, signal transduction and cell activation. Gangliosides have previously been shown to be involved in M. leprae infection41.

We hypothesise that ceramides and gangliosides are increased in serum of children with TB disease as a consequence of the immune system’s attempt to kill M.tb through the maturation of phagosomes in macrophages, as well as the production of lipid rafts as signalling platforms, to internalise M.tb, induce apoptosis and regulate cytokine responses.

Our study has some clear limitations: our sample size was modest, and - as with many paediatric TB studies, a high proportion of cases were diagnosed using a clinical rather than microbiological case definition and there was no blinding for the reading of chest X rays, which nevertheless were read by two independent doctors. Data on nutritional status beyond weight are unfortunately not available as an analytical variable but we showed strong correlation with age. The validation cohort did not include an “other diseases” group, and therefore specificity in this UK population could not be evaluated.

We included symptomatic children in our study to address the “real-world” problem of assessing symptomatic children who have recently been exposed to TB. The model helps to discriminate between TB and other diseases. However, it is unknown if the model would discriminate between children with asymptomatic TB infection and symptomatic disease. This is a subject for a subsequent study.

Additionally, not every laboratory method could be applied to all samples due to limited serum volumes. We are unsure why the model using only bacteriologically confirmed TB from the Gambia was so much better at predicting the disease status of the UK validation cohort than the combination of bacteriologically and clinically diagnosed TB cases. Children in the bacteriologically confirmed cohort in the UK were however older and more likely to be IGRA and TST-positive. We fully acknowledge that further validation will need to be undertaken to confirm the biomarkers identified in this study in a larger population, and in the context of HIV co-infection, as all of the children enrolled in the UK and The Gambia were HIV-negative.

In conclusion, we have demonstrated that alterations in host metabolism in paediatric TB are detectable using metabonomic techniques applied to small volume serum samples. The metabolic profiles provide insights into the metabolic processes associated with TB, but further validation is required to assess the clinical utility of this diagnostic approach in the context of screening algorithms.

Participants and Methods

Serum samples were obtained from prospectively recruited Gambian children with presumptive TB, who were identified by household TB contact tracing, or referred directly from community health centres, to a dedicated paediatric TB clinic at the MRC Unit The Gambia, as previously described4,6.

All children living in the same household as an adult with pulmonary TB in the Greater Banjul area were screened with a symptom questionnaire and tuberculin skin test (TST). Any child with symptoms compatible with TB was referred to the childhood TB clinic for further investigation, including CXR, microbiological investigations, and blood samples collected for immune profiling studies.

The TB disease status of participants was defined in accordance with the case definitions proposed by the World Health Organisation42, classified as either bacteriologically-confirmed or clinically diagnosed TB, given that the proposed NIH classification excludes children with a record of household exposure43.

The “other diseases” group included participants from the same household cohort whose symptoms were potentially compatible with pulmonary TB but resolved spontaneously, or with short-course conventional antibiotic treatment, and who had no radiological evidence of TB disease, no bacteriological confirmation and did not develop TB disease during the 12 months of regular follow up of the cohort.

Samples from all children consecutively diagnosed with TB disease between February 2012 and June 2014 were included, together with a random selection of children with other diseases form the same setting and investigated during the same time period, at a ratio of 3:1 with TB cases. The selection criteria have been previously described44. The Supplementary Information (SI) provides further details on the case definitions used (Supplementary Methods).

Ethics approval was granted by The Gambia Government/Medical Research Council Joint Ethics Committee (ref L2012.E01).

Samples obtained from the UK were part of the NIHR-funded IGRA Kids Study (NIKS), a prospective multicentre collaborative study aiming to assess the negative predictive value of IGRA in children exposed to TB7,45. As part of the mandatory TB contact-tracing undertaken according to national guidelines in the UK, all children (<15 years) with a history of household exposure to a source case, presenting to five paediatric TB clinics in London, together with paediatric TB clinics in Southampton, Bristol, Birmingham, Manchester, Glasgow and Newcastle between 1 January 2011 and 31 December 2014 were recruited for screening and investigations. Evaluations included history, examination, TST and IGRA tests, chest radiography, microbiology and HIV testing where appropriate. Samples included from the NIKS study came from consecutively recruited participants with either clinically diagnosed or bacteriologically-confirmed TB. The NIKS study was approved by the National Research Ethics Service (REC: 11/11/11) and cohort details have previously been published All research was performed in accordance with the relevant regulations and informed consent was obtained from the legal guardians of all participants.

Sample preparation and analysis

Serum was collected at the time of enrolment, separated within 4 hours, aliquoted, and stored at −80 °C prior to shipping on dry ice to Imperial College London, where they were preserved at −80 °C until analysis at the Clinical Phenotyping Centre.

Sample handling and quality control procedures have been reported previously46 and details on sample preparation are included in the SI.

All UPLC-MS analyses were performed on Acquity UPLC instruments coupled to Xevo G2-XS oaTOF mass spectrometers (Waters Corp., Manchester, UK) via a Zspray ESI source.

Details of the system configuration and analytical methods used for HILIC profiling have been reported previously46, with the exception of the sample preparation procedure, which was modified for application to serum and is reported in the SI.

Lipidomic profiling was conducted using an Acquity 2.1 × 100 mm BEH C8 column thermostated at 55 °C. Solvent A consisted of a 50:25:25 mixture water/acetonitrile/isopropanol with 5 mM ammonium acetate, 0.05% acetic acid, and 20 µM phosphoric acid (which was added to improve the peak shape of some phospholipid species47. Solvent B consisted of 50:50 acetonitrile/isopropanol with 5 mM ammonium acetate and 0.05% acetic acid. Initial conditions were 99:1 A:B with a flow rate of 0.6 mL/min. Additional chromatographic and spectrometric conditions for both ion modes can be found in the SI. Sample preparation for lipidomic profiling was performed as described previously48, with minor modifications described in the SI.

Significant features identified by lipidomics were compared to the MycoMass database to identify any matches relating to metabolites of potential bacterial origin49.

Samples were prepared for 1H NMR spectroscopic analysis in accordance with sample preparation protocols previously validated for serum in the section of Computational and Systems Medicine at Imperial College50. Further details on the 1H NMR spectroscopic analysis can be found in the SI. Due to limited sample volumes, some samples could not be analysed using all four analytical assays.

Statistical analysis

Unsupervised Principal component analysis (PCA) models, which do not include knowledge of the results of the reference standard, were produced to investigate whether there were any hard outliers, due to either analytical error or biological deviation. PCA is a multivariate projection method, used to extract and display systematic variation in a data matrix. The scores plots of the PCA models display correlations between the participants metabolic profiles, with points closer together representing more similar profiles, allowing groups and trends to be revealed51.

Orthogonal partial least squares-discriminant analysis (OPLS-DA) is an extension of PCA, also a multivariate modelling method, used to connect the metabonomic data to the class (diagnosis). It was used to predict the diagnosis of participants, identifying variables that discriminate between classes. OPLS-DA models were run using a Monte Carlo cross-validation strategy to avoid over-training and reliance on a single model52, with the average correlations of the projected scores and the data projected onto the spectrum for 1H NMR spectrometry data. MS data were treated in the same way.

To account for the influence of bodyweight in the metabolic profiles (Supplementary Table 2), a resampling strategy was implemented during the modelling process of the OPLS-DA models. Using the distribution of the body weights of the TB cases, samples from children with other diseases were sampled with probabilities of each being selected dependent on their weight. Further information on the statistical analysis and metabolite identification can be found in the SI.