Introduction

Fever is the one of the most common reasons that children present to Emergency departments in hospitals, especially in children under 5 years of age, in England1 and in the US2. Serious bacterial infection accounts for 5–15% of the febrile children presenting3,4,5 and most cases originating from a viral aetiology are self-limiting. Currently bacterial infection is confirmed by positive microbiological culture of a sterile sample (blood, clean catch urine or cerebrospinal fluids (CSF)). However, this can take 24–48 hours and is compounded by having a high false-negative4,6 and false positive7 rates by contaminating pathogens. Molecular detection of specific pathogens is an option but results can be confounded by co-infections and samples need to be obtained from the site of infection which can be both invasive and impractical8. Because it is challenging for paediatricians to differentiate between bacterial and viral infection in acute illness, antibiotics are often prescribed as a precautionary measure, contributing to the rise of antimicrobial resistance.

It is clear that reliable biomarkers are urgently needed that distinguish bacterial from viral infection for the purpose of good clinical management and reducing antibiotic use. Host biomarkers, i.e. the physiological changes of the host in response to a specific pathogen, have untapped diagnostic potential and their discovery can be accelerated by the advances in ‘omics’ research, especially in the field of transcriptomics9,10,11,12 and proteomics13,14,15. Metabolomics has the added advantage that it is considered to most closely reflect the native phenotype and functional state of a biological system. One In vivo animal study revealed that distinct metabolic profiles can be derived from mice infected with different bacteria16 and several similar studies focusing on meningitis have shown that metabolic profiling of CSF can differentiate between meningitis and negative controls17, as well as between viral and bacterial meningitis18. Mason et al.19 demonstrated the possibility of diagnosis and prognosis of tuberculous meningitis with non-invasive urinary metabolic profiles. Metabolic changes in urine can be used to differentiate children with respiratory syncytial virus (RSV) from healthy control, as well as from those with bacterial causes of respiratory distress20.

Lipids are essential structural components of cell membranes and energy storage molecules. Thanks to the advances in lipidomics, a subset of metabolomics, lipids and lipid mediators have been increasingly recognised to play a crucial role in different metabolic pathways and cellular functions, particularly in immunity and inflammation21,22. However, the potential of lipidomics to distinguish bacterial from viral infection in febrile children has never been explored.

In this study, we undertook a lipidomic analysis of plasma taken from febrile children with confirmed bacterial infection (n = 20) and confirmed viral infection (n = 20) as a proof of concept study. We show that bacterial and viral infection produces distinct profiles in the plasma lipids of febrile children that might be exploited diagnostically.

Methods

Study population and sampling

The European Union Childhood Life-Threatening Infectious Disease Study (EUCLIDS)23 prospectively recruited patients, aged from 1 month to 18 years, with sepsis or severe focal infection from 98 participating hospitals in the UK, Austria, Germany, Lithuania, Spain and the Netherlands between 2012 and 2015. Plasma and other biosamples were collected to investigate the underlying genetics, proteomics and metabolomics of children with severe infectious disease phenotype.

Infections in Children in the Emergency Department (ICED) study aimed to define clinical features that would predict bacterial illness in children and patterns of proteomics, genomics and metabolomics associated with infections. This study included children aged 0–16 years at Imperial College NHS Healthcare Trust, St Mary’s Hospital, between June 2014 and March 201524.

The population consisted of children (≤17 years old) presenting with fever ≥38 °C, with diverse clinical symptoms and a spectrum of pathogens. Both studies were approved by the local institutional review boards (ICED REC No 14/LO/0266 approved by NRES Committee London – Camden & Islington; EUCLIDS REC No 11/LO/1982 approved by NRES Committee London – Fulham). Written informed consent was obtained from parents and assent from children, where appropriate. All methods were performed in accordance with the relevant guidelines and regulations. For the EUCLIDS study, a common clinical protocol agreed by EUCLIDS Clinical Network and approved by the Ethics Committee was implemented at all hospitals.

Patients were divided into those with confirmed bacterial (n = 20) and confirmed viral (n = 20) infection groups. The bacterial group consisted exclusively of patients with confirmed sterile site culture-positive bacterial infections, and the viral infection group consisted of only patients with culture, molecular or immunofluorescent-confirmed viral infection and having no co-existing bacterial infection.

Blood samples were collected in tubes spray-coated with EDTA at, or as close as possible to, the time of presentation to hospital and plasma obtained by centrifugation of blood samples for 10 mins at 1,300 g at 4 °C. Plasma was stored at −80 °C before being shipped on dry ice to Imperial College London for lipidomic analysis.

Lipidomic analysis

Lipidomic analysis was carried out as previously described25. Briefly, 50 µl of water were added to 50 µl of plasma, vortexed and shaken for 5 min at 1,400 rpm at 4 °C. Four hundred µl of isopropanol containing internal standards (9 in negative mode, 11 for positive mode covering 10 lipid sub-classes) were added for lipid extraction. Samples were shaken at 1,400 rpm for 2 hours at 4 °C then centrifuged at 3,800 g for 10 min. Two aliquots of 100 µl of the supernatant fluid were transferred to a 96-well plate for ultra-performance liquid chromatography (UPLC) –mass spectrometry (MS) lipidomics analysis in positive and negative mode.

Liquid chromatography separation was carried out using an Acquity UPLC system (Waters Corporation, USA) with an injection volume of 1 µl and 2 µl for Positive and Negative ESI, respectively. An Acquity UPLC BEH column (C8, 2.1 × 100 mm, 1.7 µm; Waters Corporation, USA) was used for the purpose. Mobile phase A consisted of water/isopropanol/acetonitrile (2:1:1; v:v:v) with the addition of 5 mM ammonium acetate, 0.05% acetic acid and 20 µM phosphoric acid. Mobile phase B consisted of isopropanol: acetonitrile (1:1; v:v) with the addition of 5 mM ammonium acetate and 0.05% acetic acid. Flow rate was 0.6 ml/min with a total run time of 15 min and the gradient set as starting condition of 1% mobile phase B for 0.1 min, followed by an increase to 30% mobile phase B from 0.1 to 2 min, and to 90% mobile phase B from 2 min to 11.5 min. The gradient was held at 99.99% mobile phase B between 12 and 12.55 min before returning to the initial condition for re-equilibrium.

MS detection was achieved using a Xevo G2-S QTof mass spectrometer (Waters MS Technologies, UK) and data acquired in both positive and negative modes. The MS setting was configured as follows: capillary voltage 2.0 kV for Positive mode, 1.5 kV for Negative mode, sample cone voltage 25 V, source offset 80, source temperature 120 °C, desolvation temperature 600 °C, desolvation gas flow 1000 L/h, and cone gas flow 150 L/h. Data were collected in centroid mode with a scan range of 50–2000 m/z and a scan time of 0.1 s. LockSpray mass correction was applied for mass accuracy using a 600 pg/ µL leucine enkephaline (m/z 556.2771 in ESI+, m/z 554.2615 in ESI−) solution in water/acetonitrile solution (1:1; v/v) at a flow rate of 15 µl/min.

Spectral and statistical analysis

A Study Quality Control sample (SQC) was prepared by pooling 25 µl of all samples. The SQC was diluted to seven different concentrations, extracted at the same ratio 1:4 with isopropanol and replicates acquired at each concentration at the beginning and end of the run. A Long-Term Reference sample (LTR, made up of pooled plasma samples from external sources) and the SQC were diluted with water (1:1; v:v) and 400 µL of isopropanol containing internal standards (the same preparation as for the study samples) and injected once every 10 study samples, with 5 samples between a LTR and a SQC. Deconvolution of the spectra was carried out using the XCMS package. Extracted metabolic features were subsequently filtered and only those present with a relative coefficient of variation less than 15% across all SQC samples were retained. Additionally, metabolic features that did not correlate with a coefficient greater than 0.9 in a serial dilution series of SQC samples were removed.

Multivariate data analysis was carried out using SIMCA-P 14.1 (Umetrics AB, Sweden). The dataset was pareto-scaled prior to principal component analysis (PCA) and orthogonal partial least squares discriminate analysis (OPLS-DA). While PCA is an unsupervised technique useful for observing inherent clustering and identifying potential outliers in the dataset, OPLS-DA is a supervised method in which data is modelled against a specific descriptor of interest (in this case viral vs. bacterial infection classes). As for all supervised methods, model validity and robustness must be assessed before results can be interpreted. For OPLS-DA, model quality was assessed by internal cross-validation (Q2Y-hat value) and permutation testing in which the true Q2Y-hat value is compared to 999 models with random permutations of class membership. For valid and robust models (positive Q2Y-hat and permutation p-value < 0.05), metabolic features responsible for class separation were identified by examining the corresponding S-plot (a scatter plot of model loadings and correlation to class) with a cut-off of 0.05.

Metabolite annotation

Short-listed metabolic features were subjected to tandem mass spectrometry in order to obtain fragmentation patterns. Patterns were compared against metabolome databases (Lipidmaps, HMDB, Metlin). Isotopic distribution matching was also checked. In addition, when possible the fragmented patterns were matched against available authentic standards run under the same analytical setting for retention time and MS/MS patterns. Annotation level, according to the Metabolomics Standards Initiative, are summarised in Table 126.

Table 1 Metabolic features changed in bacterial and viral group.

Single feature ROC curve analysis

Analysis was performed with the web server, MetaboAnalyst 4.0. Sensitivities and specificities of lipids and predicted probabilities for the correct classification were presented as Receiver Operating Characteristic (ROC) curves. The Area Under the Curve (AUC) represents the discriminatory power of the lipids, with the value closest to 1 indicating the better classification.

Feature selection

An ‘in-house’ variable selection method, forward selection-partial least squares (FS-PLS; https://github.com/lachlancoin/fspls.git), was used to identify a small diagnostic signature for distinguishing bacterial and viral infections. FS-PLS identifies a small signature made up non-correlated features. The first iteration of FS-PLS considers the levels of all features (N) and initially fits N univariate regression models. The regression coefficient for each model is estimated using the Maximum Likelihood Estimation (MLE) function, and the goodness of fit is assessed by a t-test. The variable with the highest MLE and smallest p-value is selected first (SV1). Before selecting which of the N-1 remaining variables to use next, the algorithm projects the variation explained by SV1 using Singular Value Decomposition (SVD). The algorithm iteratively fits up to N-1 models, at each step projecting the variation corresponding to the already selected variables, and selecting new variables based on the residual variation. Projecting out the variation of selected features ensures that the final features in the signature are not correlated with each other. The FS-PLSprocess terminates when the MLE p-value exceeds a pre-defined threshold (pthresh). The final model includes regression coefficients for all selected variables. First, FS-PLS was applied to the abundance values for the short-listed metabolic features identified through OPLS-DA analysis.

An individual’s age and sex can impact upon their metabolome greatly. Limma27 was used for differential abundance analysis to identify metabolites that are associated with age or sex. Features were considered to be associated with age or sex if they achieved an FDR p-value lower than 0.05. FS-PLS was re-applied to the dataset after having removed these features and the resulting signature was compared to the signatures from the full dataset.

The disease risk score (DRS) was calculated for the 3 metabolite signature. The DRS translates the abundance values of the features in the signature into a single value, indicating the disease group of the sample11.

The sensitivity and specificity of the lipid signature were presented as a receiver operating characteristic (ROC) curve with the 95% confidence regions calculated through bootstrap analysis with 500 iterations.

Results

Patient characteristics

The baseline characteristics were divided into those with definitive bacterial and definitive viral infection, summarised in Table 2. When selecting patient samples, patient characteristics were matched as much as possible to ensure no particular factor would confound the model. There was no significant difference in ages between the two groups (p = 0.97). Both groups had similar sex split. Seven from definitive bacterial infection group and 6 from the definitive viral infection group were admitted to the Paediatric Intensive Care Unit (PICU). A range of pathogens was present in each group.

Table 2 Demographic and clinical patient characteristic.

Plasma lipidome can differentiate bacterial from viral infection

PCA was conducted first to evaluate the data, visualise dominant patterns, and identify outliers within populations (Fig. 1). The same outlier sample was present in both negative (Fig. 1A) and positive (Fig. 1B) polarity datasets and as such, was removed from subsequent analysis. SQC samples were tightly grouped together in the PCA scatter plot, indicating minimum analytical variability throughout the run.

Figure 1
figure 1

Principal components analysis (PCA) of lipidomics dataset. (A) Scatter plot of PCA model from data acquired in negative polarity mode. (B) Scatter plot of PCA model from data acquired in positive polarity mode. Quality control samples are shown in red, bacterial infected samples are shown in blue and viral infected samples shown in green.

OPLS-DA, a supervised PCA method, was carried out on both positive and negative polarity datasets. In the positive polarity mode no model was successfully built to distinguish between viral and bacterial infection groups (data not shown). However, in the negative polarity dataset, an OPLS-DA model separated bacterial infected samples from viral infected samples (with 3891 features). The robustness of the model was characterised by R2X (cum) = 0.565, R2Y-hat (cum) = 0.843 and Q2Y-hat (cum) = 0.412 and permutation p-value = 0.01 (999 tests). Cross-validated scores plot using the whole lipidome dataset indicated bacterial infected samples were more prone to miss-classification than viral infected samples (Fig. 2).

Figure 2
figure 2

The scatter plot of the cross-validated score vectors showing the clustering of definitive bacterial infected samples (green dots) from definitive viral infected samples (blue dots).

Lipid changes were not the same in the bacterial and viral infected groups

Metabolic features contributing to the separation of the model are plotted in Fig. 3 and summarised in Table 1. Some species of glycerophosphoinositol, monoacylglycerophosphocholine, sphingomyelin and sulfatide were higher in the viral group when compared to the bacterial group, while some species of fatty acids, glycerophosphocholine, glycerophosphoserine and lactosylceramide were higher in bacterial infection when compared with viral infection. Bilirubin and cholesterol sulfate, although not lipids, were detected by lipidomic analysis, and these were higher in the bacterial and viral groups when compared to the other group, respectively.

Figure 3
figure 3

Manhattan-style plot of the 3891 lipid features detected by lipid-positive mode UPLC-MS with 40 features showing a significant association with infection type (as determined by model S-plot) highlighted and annotated. Y axis Sign(p) x P is the loadings of the OPLS-DA (i.e. modelled covariance p[1]). *Cholesterol sulfate – isomers due to different position of the sulfate.

Evaluation of diagnostic potential of metabolic biomarkers

ROC curve analysis was performed to evaluate the diagnostic potential of these lipids in distinguishing bacterial from viral infection. Out of all discriminatory lipids, PC (16:0/16:0), unknown feature m/z 239.157 and PE (16:0/18:2) generated the highest AUCs of 0.774 (CI, 0.6–0.902), 0.721 (CI, 0.545–0.871) and 0.705 (CI, 0.52–0.849), respectively (Fig. 4).

Figure 4
figure 4

Receiver operator characteristic (ROC) analysis based on single lipids. ROC curve analysis of top 3 lipids PC (16:0/16:0) (A), unknown feature (m/z 239.157) (B) and PE (16:0/18:2) (C) which gave with highest Area Under the Curve (AUC) values.

FS-PLS was initially carried out on the abundance values for all 28 shortlisted features. A signature was identified made up the following 3 lipids: SHexCer(d42:3); PC (16:0/16:0); and LacCer(d18:1/24:1). The impacts of age and sex on the feature selection process were explored. With a false discovery rate (FDR) of 0.05, 5 out of the 28 features were identified as being significantly differentially abundant between samples above or below the median age. None of the 28 features were identified as being significantly differentially abundant between males and females. The 5 features that were associated with age were removed and FS-PLS was re-ran on the filtered dataset. The same 3-metabolite signature (SHexCer(d42:3), PC(16:0/16:0), LacCer(d18:1/24:1)) was identified, showing that the signature is robust to age effects.

This signature achieved an improved ROC curve with AUC of 0.911 (95% confidence interval: 0.81–0.98) when compared with those generated from individual lipids. The ROC curve and confidence intervals calculated through bootstrapping are shown in Fig. 5. Figure 6 shows the disease risk scores for definitive bacterial and definitive viral samples with points overlaid to indicate the sex or age (above or below median) of the sample.

Figure 5
figure 5

Receiver operator characteristic (ROC) analysis based on 3-lipid signature. A combination of SHexCer(d42:3), PC (16:0/16:0) and LacCer(d18:1/24:1) achieved AUC of 0.911 (CI 95% 0.81–0.98).

Figure 6
figure 6

Boxplots comparing the Disease Risk Score (DRS) for definitive bacterial and definitive viral samples. The DRS was calculated using abundance values from the 3-metabolite signature identified by FS-PLS. Plot A shows points coloured according to the sex of the sample and plot B shows points coloured according to whether the sample was above or below the median age (9 months).

Discussion

We have shown that differences in the host lipidome are induced by bacterial and viral infections. While differences in host responses between viral and bacterial infections have been previously reported, for example as differential expression of proteins, RNAs and level of metabolites9,10,11,12,13,14,20, there have been no claims in relation to the lipidome changes in carefully-phenotyped samples. Although age is known to affect metabolism28, it is important to note the metabolic changes associated with infection described herein, were consistent among samples from patients whose age ranged from 1 month to 9 years old.

Some species of glycerophosphoinositol, sphingomyelin, lysophosphatidylcholine and cholesterol sulfate were higher in the confirmed virus-infected group when compared with bacterial infected group, while some species of fatty acids, glycerophosphocholine, glycerophosphoserine, lactosylceramide and bilirubin were higher in cases with confirmed bacterial infection when compared with viral infection.

The important effects of infection on fatty acid metabolism have been highlighted by Munger et al. who demonstrated human cytomegalovirus (HCMV) up-regulated fatty acid biosynthesis in infected host cells. Pharmacologically inhibition of fatty acid biosynthesis suppressed viral replication for both HCMV and influenza A virus29. The importance of fatty acid biosynthesis may reflect its essential role in viral envelopment during viral replication. Rhinovirus induced metabolic reprogramming in host cell by increasing glucose uptake and indicated a shift towards lipogenesis and/or fatty acid uptake30. In our study, fatty acids linoleic acid (FA 18:2), palmitic acid (FA 16:0), oleic acid (FA 18:1) and palmitoleic acid (FA 16:1) were lower in viral infection when compared to bacterial infection, and may reflect enhanced lipogenesis and fatty acid uptake in the host cell during viral replication.

The increase in cholesterol sulfate observed may reflect changes in cellular lipid biosynthesis and T cell signalling during viral infection. Cholesterol sulfate is believed to play a key role as a membrane stabiliser31 and can also act to modulate cellular lipid biosynthesis32 and T cell receptor signal transduction33. Gong et al. demonstrated that cholesterol sulfate was elevated in the serum of piglets infected with swine fever virus34. Taken together, these observations indicate that this compound could be a marker of viral infection.

Higher level of sphingomyelin SM(d18:1/24:1), SM(d18:1/23:0) and SM(d18:1/24:0), and lysophosphocholine LPC (16:0) upon viral infection may also be linked to viral replication in infected cells. Accumulation of cone-shaped lipids, such as LPC in one leaflet of the membrane bilayer induces membrane curvature required for virus budding35. It is known that viral replication, for example in the case of dengue virus, induces dramatic changes in infected cells, including sphingomyelin, to alter the curvature and permeability of membranes36. Furthermore, the altered levels of sphingomyelin can be partially explained by elevated cytokine levels during bacterial infection, such as TNF-α37, which can activate sphingomyelinase, hydrolysing sphingomyelin to ceramide38. Hence, sphingomyelin may be a class of lipids that plays a role in both viral and bacterial infection.

Lactosylceramide LacCer(d18:1/24:1) and LacCer (d18:1/16:0) were higher in bacterial infection in comparison to viral infection. Lactosylceramide, found in microdomains on the plasma membrane of cells, is a glycosphingolipid consisting of a hydrophobic ceramide lipid and a hydrophilic sugar moiety. Lactosylceramide plays an important role in bacterial infection by serving as a pattern recognition receptors (PRRs) to detect pathogen-associated molecular patterns (PAMPs). Lactosylceramide composed of long chain fatty acid chain C24, such as LacCer(d:18:1/24:1) increased in our study, is essential for formation of LacCer-Lyn complexes on neutrophils, which function as signal transduction platforms for αMβ2 integrin-mediated phagocytosis39.

Other lipids that were changed in our study, such as sulfatides and glycerophosphocholines, may also play an important role in bacterial infection. Sulfatides are multifunctional molecules involved in various biological process, including immune system regulation and during infection40. Sulfatides can act as glycolipid receptors that attach bacteria, such as Escherichia coli41, Mycoplasma hyopneumoniae42 and Pseudomonas aeruginosa43 to the mucosal surfaces. Five glycerophosphocholine species including PC(16:0/18:2), PC(18:0/18:1), PC(18:0/18:2), PC(16:0/16:0) and PC(16:0/18:1) werehigher in bacterial infected samples when compared with viral infected samples. Glycerophosphocholine was elevated in a lipidomics study looking at plasma from tuberculosis patients44, however, the exact role of glycerophosphocholine remains elusive. Bilirubin is detected as a consequence of breadth of lipidome coverage, and its role in infection is unclear. The lipid species identified in this study present an opportunity for further mechanistic study to understand the host responses in bacterial or viral infection.

A combination of three lipids achieved a strong area under the receiver operating characteristic (ROC) curve of 0.911 (95% CI 0.81 to 0.98). Similar approaches have been taken using routine laboratory parameters and more recently gene expression where 2-gene transcripts achieved an ROC AUC of 0.95 (95% CI 0.94-1)11. The relevance of our data is that they provide the potential for a rapid diagnostic test with which clinicians could distinguish bacterial from viral infection in febrile children.

The study has limitations. Firstly, we were unable to annotate 4 of the 29 discriminatory features, of which two were assigned with only a broad lipid class by identifying the head group (PE). The unknown feature with m/z of 239.157 achieved the second highest AUC for ROC curve analysis on an individual basis. The unknown identity prevents this feature from being a potential marker and hinders biological understanding. This feature, however, was not included in the final 3-lipid panel that gave the highest AUC. Secondly, the sample size in this pilot study is small. Validation studies using quantitative assay are now required to confirm the findings. In addition, in larger validation studies, we will look into the signature of specific pathogens, and potentially co-infection by multiple pathogens.

This is the first lipidomics study carried out on plasma taken from febrile children for the purpose of distinguishing bacterial from viral infection. It demonstrates the potential of this approach to facilitate effective clinical management by rapidly diagnosing bacterial infection in paediatrics.