Plasma lipid profiles discriminate bacterial from viral infection in febrile children

Fever is the most common reason that children present to Emergency Departments. Clinical signs and symptoms suggestive of bacterial infection are often non-specific, and there is no definitive test for the accurate diagnosis of infection. The ‘omics’ approaches to identifying biomarkers from the host-response to bacterial infection are promising. In this study, lipidomic analysis was carried out with plasma samples obtained from febrile children with confirmed bacterial infection (n = 20) and confirmed viral infection (n = 20). We show for the first time that bacterial and viral infection produces distinct profile in the host lipidome. Some species of glycerophosphoinositol, sphingomyelin, lysophosphatidylcholine and cholesterol sulfate were higher in the confirmed virus infected group, while some species of fatty acids, glycerophosphocholine, glycerophosphoserine, lactosylceramide and bilirubin were lower in the confirmed virus infected group when compared with confirmed bacterial infected group. A combination of three lipids achieved an area under the receiver operating characteristic (ROC) curve of 0.911 (95% CI 0.81 to 0.98). This pilot study demonstrates the potential of metabolic biomarkers to assist clinicians in distinguishing bacterial from viral infection in febrile children, to facilitate effective clinical management and to the limit inappropriate use of antibiotics.


Methods
Study population and sampling. The European Union Childhood Life-Threatening Infectious Disease Study (EUCLIDS) 23 prospectively recruited patients, aged from 1 month to 18 years, with sepsis or severe focal infection from 98 participating hospitals in the UK, Austria, Germany, Lithuania, Spain and the Netherlands between 2012 and 2015. Plasma and other biosamples were collected to investigate the underlying genetics, proteomics and metabolomics of children with severe infectious disease phenotype.
Infections in Children in the Emergency Department (ICED) study aimed to define clinical features that would predict bacterial illness in children and patterns of proteomics, genomics and metabolomics associated with infections. This study included children aged 0-16 years at Imperial College NHS Healthcare Trust, St Mary's Hospital, between June 2014 and March 2015 24 .
The population consisted of children (≤17 years old) presenting with fever ≥38 °C, with diverse clinical symptoms and a spectrum of pathogens. Both studies were approved by the local institutional review boards (ICED REC No 14/LO/0266 approved by NRES Committee London -Camden & Islington; EUCLIDS REC No 11/ LO/1982 approved by NRES Committee London -Fulham). Written informed consent was obtained from parents and assent from children, where appropriate. All methods were performed in accordance with the relevant guidelines and regulations. For the EUCLIDS study, a common clinical protocol agreed by EUCLIDS Clinical Network and approved by the Ethics Committee was implemented at all hospitals.
Patients were divided into those with confirmed bacterial (n = 20) and confirmed viral (n = 20) infection groups. The bacterial group consisted exclusively of patients with confirmed sterile site culture-positive bacterial infections, and the viral infection group consisted of only patients with culture, molecular or immunofluorescent-confirmed viral infection and having no co-existing bacterial infection.
Blood samples were collected in tubes spray-coated with EDTA at, or as close as possible to, the time of presentation to hospital and plasma obtained by centrifugation of blood samples for 10 mins at 1,300 g at 4 °C. Plasma was stored at −80 °C before being shipped on dry ice to Imperial College London for lipidomic analysis.
Lipidomic analysis. Lipidomic analysis was carried out as previously described 25 . Briefly, 50 µl of water were added to 50 µl of plasma, vortexed and shaken for 5 min at 1,400 rpm at 4 °C. Four hundred µl of isopropanol containing internal standards (9 in negative mode, 11 for positive mode covering 10 lipid sub-classes) were added for lipid extraction. Samples were shaken at 1,400 rpm for 2 hours at 4 °C then centrifuged at 3,800 g for 10 min. Two aliquots of 100 µl of the supernatant fluid were transferred to a 96-well plate for ultra-performance liquid chromatography (UPLC) -mass spectrometry (MS) lipidomics analysis in positive and negative mode.
Liquid chromatography separation was carried out using an Acquity UPLC system (Waters Corporation, USA) with an injection volume of 1 µl and 2 µl for Positive and Negative ESI, respectively. An Acquity UPLC BEH column (C8, 2.1 × 100 mm, 1.7 µm; Waters Corporation, USA) was used for the purpose. Mobile phase A consisted of water/isopropanol/acetonitrile (2:1:1; v:v:v) with the addition of 5 mM ammonium acetate, 0.05% acetic acid and 20 µM phosphoric acid. Mobile phase B consisted of isopropanol: acetonitrile (1:1; v:v) with the addition of 5 mM ammonium acetate and 0.05% acetic acid. Flow rate was 0.6 ml/min with a total run time of 15 min and the gradient set as starting condition of 1% mobile phase B for 0.1 min, followed by an increase to 30% mobile phase B from 0.1 to 2 min, and to 90% mobile phase B from 2 min to 11.5 min. The gradient was held at 99.99% mobile phase B between 12 and 12.55 min before returning to the initial condition for re-equilibrium.
MS detection was achieved using a Xevo G2-S QTof mass spectrometer (Waters MS Technologies, UK) and data acquired in both positive and negative modes. The MS setting was configured as follows: capillary voltage 2.0 kV for Positive mode, 1.5 kV for Negative mode, sample cone voltage 25 V, source offset 80, source temperature 120 °C, desolvation temperature 600 °C, desolvation gas flow 1000 L/h, and cone gas flow 150 L/h. Data were collected in centroid mode with a scan range of 50-2000 m/z and a scan time of 0.1 s. LockSpray mass correction was applied for mass accuracy using a 600 pg/ µL leucine enkephaline (m/z 556.2771 in ESI+, m/z 554.2615 in ESI−) solution in water/acetonitrile solution (1:1; v/v) at a flow rate of 15 µl/min.

Spectral and statistical analysis.
A Study Quality Control sample (SQC) was prepared by pooling 25 µl of all samples. The SQC was diluted to seven different concentrations, extracted at the same ratio 1:4 with isopropanol and replicates acquired at each concentration at the beginning and end of the run. A Long-Term Reference sample (LTR, made up of pooled plasma samples from external sources) and the SQC were diluted with water (1:1; v:v) and 400 µL of isopropanol containing internal standards (the same preparation as for the study samples) and injected once every 10 study samples, with 5 samples between a LTR and a SQC. Deconvolution of the spectra was carried out using the XCMS package. Extracted metabolic features were subsequently filtered and only those present with a relative coefficient of variation less than 15% across all SQC samples were retained. Additionally, metabolic features that did not correlate with a coefficient greater than 0.9 in a serial dilution series of SQC samples were removed.
Multivariate data analysis was carried out using SIMCA-P 14.1 (Umetrics AB, Sweden). The dataset was pareto-scaled prior to principal component analysis (PCA) and orthogonal partial least squares discriminate analysis (OPLS-DA). While PCA is an unsupervised technique useful for observing inherent clustering and identifying potential outliers in the dataset, OPLS-DA is a supervised method in which data is modelled against a specific descriptor of interest (in this case viral vs. bacterial infection classes). As for all supervised methods, model validity and robustness must be assessed before results can be interpreted. For OPLS-DA, model quality was assessed by internal cross-validation (Q 2 Y-hat value) and permutation testing in which the true Q 2 Y-hat value is compared to 999 models with random permutations of class membership. For valid and robust models (positive Q 2 Y-hat and permutation p-value < 0.05), metabolic features responsible for class separation were identified by examining the corresponding S-plot (a scatter plot of model loadings and correlation to class) with a cut-off of 0.05.

Metabolite annotation.
Short-listed metabolic features were subjected to tandem mass spectrometry in order to obtain fragmentation patterns. Patterns were compared against metabolome databases (Lipidmaps, HMDB, Metlin). Isotopic distribution matching was also checked. In addition, when possible the fragmented patterns were matched against available authentic standards run under the same analytical setting for retention time and MS/MS patterns. Annotation level, according to the Metabolomics Standards Initiative, are summarised in Table 1 26 . Single feature ROC curve analysis. Analysis was performed with the web server, MetaboAnalyst 4.0.
Sensitivities and specificities of lipids and predicted probabilities for the correct classification were presented as Receiver Operating Characteristic (ROC) curves. The Area Under the Curve (AUC) represents the discriminatory power of the lipids, with the value closest to 1 indicating the better classification.
Feature selection. An 'in-house' variable selection method, forward selection-partial least squares (FS-PLS; https://github.com/lachlancoin/fspls.git), was used to identify a small diagnostic signature for distinguishing bacterial and viral infections. FS-PLS identifies a small signature made up non-correlated features. The first iteration of FS-PLS considers the levels of all features (N) and initially fits N univariate regression models. The regression coefficient for each model is estimated using the Maximum Likelihood Estimation (MLE) function, and the goodness of fit is assessed by a t-test. The variable with the highest MLE and smallest p-value is selected first (SV1). Before selecting which of the N-1 remaining variables to use next, the algorithm projects the variation explained by SV1 using Singular Value Decomposition (SVD). The algorithm iteratively fits up to N-1 models, at each step projecting the variation corresponding to the already selected variables, and selecting new variables based on the residual variation. Projecting out the variation of selected features ensures that the final features in the signature are not correlated with each other. The FS-PLSprocess terminates when the MLE p-value exceeds a pre-defined threshold (p thresh ). The final model includes regression coefficients for all selected variables. First, FS-PLS was applied to the abundance values for the short-listed metabolic features identified through OPLS-DA analysis.
An individual's age and sex can impact upon their metabolome greatly. Limma 27 was used for differential abundance analysis to identify metabolites that are associated with age or sex. Features were considered to be associated with age or sex if they achieved an FDR p-value lower than 0.05. FS-PLS was re-applied to the dataset after having removed these features and the resulting signature was compared to the signatures from the full dataset.
www.nature.com/scientificreports www.nature.com/scientificreports/ The disease risk score (DRS) was calculated for the 3 metabolite signature. The DRS translates the abundance values of the features in the signature into a single value, indicating the disease group of the sample 11 .
The sensitivity and specificity of the lipid signature were presented as a receiver operating characteristic (ROC) curve with the 95% confidence regions calculated through bootstrap analysis with 500 iterations.

Results
Patient characteristics. The baseline characteristics were divided into those with definitive bacterial and definitive viral infection, summarised in Table 2. When selecting patient samples, patient characteristics were matched as much as possible to ensure no particular factor would confound the model. There was no significant difference in ages between the two groups (p = 0.97). Both groups had similar sex split. Seven from definitive bacterial infection group and 6 from the definitive viral infection group were admitted to the Paediatric Intensive Care Unit (PICU). A range of pathogens was present in each group.
Plasma lipidome can differentiate bacterial from viral infection. PCA was conducted first to evaluate the data, visualise dominant patterns, and identify outliers within populations (Fig. 1). The same outlier sample was present in both negative (Fig. 1A) and positive (Fig. 1B) polarity datasets and as such, was removed from subsequent analysis. SQC samples were tightly grouped together in the PCA scatter plot, indicating minimum analytical variability throughout the run.
OPLS-DA, a supervised PCA method, was carried out on both positive and negative polarity datasets. In the positive polarity mode no model was successfully built to distinguish between viral and bacterial infection groups (data not shown). However, in the negative polarity dataset, an OPLS-DA model separated bacterial infected samples from viral infected samples (with 3891 features). The robustness of the model was characterised by R2X (cum) = 0.565, R2Y-hat (cum) = 0.843 and Q2Y-hat (cum) = 0.412 and permutation p-value = 0.01 (999 tests). Cross-validated scores plot using the whole lipidome dataset indicated bacterial infected samples were more prone to miss-classification than viral infected samples (Fig. 2).
Lipid changes were not the same in the bacterial and viral infected groups. Metabolic features contributing to the separation of the model are plotted in Fig. 3  www.nature.com/scientificreports www.nature.com/scientificreports/ glycerophosphoinositol, monoacylglycerophosphocholine, sphingomyelin and sulfatide were higher in the viral group when compared to the bacterial group, while some species of fatty acids, glycerophosphocholine, glycerophosphoserine and lactosylceramide were higher in bacterial infection when compared with viral infection. Bilirubin and cholesterol sulfate, although not lipids, were detected by lipidomic analysis, and these were higher in the bacterial and viral groups when compared to the other group, respectively. FS-PLS was initially carried out on the abundance values for all 28 shortlisted features. A signature was identified made up the following 3 lipids: SHexCer(d42:3); PC (16:0/16:0); and LacCer(d18:1/24:1). The impacts of age and sex on the feature selection process were explored. With a false discovery rate (FDR) of 0.05, 5 out of the 28 features were identified as being significantly differentially abundant between samples above or below the median age. None of the 28 features were identified as being significantly differentially abundant between males and females. The 5 features that were associated with age were removed and FS-PLS was re-ran on the filtered dataset. The same 3-metabolite signature (SHexCer(d42:3), PC(16:0/16:0), LacCer(d18:1/24:1)) was identified, showing that the signature is robust to age effects.

Evaluation of diagnostic potential of metabolic biomarkers. ROC curve analysis was
This signature achieved an improved ROC curve with AUC of 0.911 (95% confidence interval: 0.81-0.98) when compared with those generated from individual lipids. The ROC curve and confidence intervals calculated through bootstrapping are shown in Fig. 5. Figure 6 shows the disease risk scores for definitive bacterial and definitive viral samples with points overlaid to indicate the sex or age (above or below median) of the sample.

Discussion
We have shown that differences in the host lipidome are induced by bacterial and viral infections. While differences in host responses between viral and bacterial infections have been previously reported, for example as differential expression of proteins, RNAs and level of metabolites [9][10][11][12][13][14]20 , there have been no claims in relation to the lipidome changes in carefully-phenotyped samples. Although age is known to affect metabolism 28 , it is important to note the metabolic changes associated with infection described herein, were consistent among samples from patients whose age ranged from 1 month to 9 years old. Some species of glycerophosphoinositol, sphingomyelin, lysophosphatidylcholine and cholesterol sulfate were higher in the confirmed virus-infected group when compared with bacterial infected group, while some species of fatty acids, glycerophosphocholine, glycerophosphoserine, lactosylceramide and bilirubin were higher in cases with confirmed bacterial infection when compared with viral infection.
The   www.nature.com/scientificreports www.nature.com/scientificreports/ Pharmacologically inhibition of fatty acid biosynthesis suppressed viral replication for both HCMV and influenza A virus 29 . The importance of fatty acid biosynthesis may reflect its essential role in viral envelopment during viral replication. Rhinovirus induced metabolic reprogramming in host cell by increasing glucose uptake and indicated a shift towards lipogenesis and/or fatty acid uptake 30 . In our study, fatty acids linoleic acid (FA 18:2), palmitic acid (FA 16:0), oleic acid (FA 18:1) and palmitoleic acid (FA 16:1) were lower in viral infection when compared to bacterial infection, and may reflect enhanced lipogenesis and fatty acid uptake in the host cell during viral replication.
The increase in cholesterol sulfate observed may reflect changes in cellular lipid biosynthesis and T cell signalling during viral infection. Cholesterol sulfate is believed to play a key role as a membrane stabiliser 31 and can also act to modulate cellular lipid biosynthesis 32 and T cell receptor signal transduction 33 . Gong et al. demonstrated that cholesterol sulfate was elevated in the serum of piglets infected with swine fever virus 34 . Taken together, these observations indicate that this compound could be a marker of viral infection.
Higher level of sphingomyelin SM(d18:1/24:1), SM(d18:1/23:0) and SM(d18:1/24:0), and lysophosphocholine LPC (16:0) upon viral infection may also be linked to viral replication in infected cells. Accumulation of cone-shaped lipids, such as LPC in one leaflet of the membrane bilayer induces membrane curvature required for virus budding 35 . It is known that viral replication, for example in the case of dengue virus, induces dramatic changes in infected cells, including sphingomyelin, to alter the curvature and permeability of membranes 36 . Furthermore, the altered levels of sphingomyelin can be partially explained by elevated cytokine levels during bacterial infection, such as TNF-α 37 , which can activate sphingomyelinase, hydrolysing sphingomyelin to ceramide 38 . Hence, sphingomyelin may be a class of lipids that plays a role in both viral and bacterial infection.
Lactosylceramide LacCer(d18:1/24:1) and LacCer (d18:1/16:0) were higher in bacterial infection in comparison to viral infection. Lactosylceramide, found in microdomains on the plasma membrane of cells, is a glycosphingolipid consisting of a hydrophobic ceramide lipid and a hydrophilic sugar moiety. Lactosylceramide plays an important role in bacterial infection by serving as a pattern recognition receptors (PRRs) to detect pathogen-associated molecular patterns (PAMPs). Lactosylceramide composed of long chain fatty acid chain C24, such as LacCer(d:18:1/24:1) increased in our study, is essential for formation of LacCer-Lyn complexes on neutrophils, which function as signal transduction platforms for αMβ2 integrin-mediated phagocytosis 39 .
Other lipids that were changed in our study, such as sulfatides and glycerophosphocholines, may also play an important role in bacterial infection. Sulfatides are multifunctional molecules involved in various biological process, including  www.nature.com/scientificreports www.nature.com/scientificreports/ immune system regulation and during infection 40 . Sulfatides can act as glycolipid receptors that attach bacteria, such as Escherichia coli 41 , Mycoplasma hyopneumoniae 42 and Pseudomonas aeruginosa 43 to the mucosal surfaces. Five glycerophosphocholine species including PC(16:0/18:2), PC(18:0/18:1), PC(18:0/18:2), PC(16:0/16:0) and PC(16:0/18:1) werehigher in bacterial infected samples when compared with viral infected samples. Glycerophosphocholine was elevated in a lipidomics study looking at plasma from tuberculosis patients 44 , however, the exact role of glycerophosphocholine remains elusive. Bilirubin is detected as a consequence of breadth of lipidome coverage, and its role in infection is unclear. The lipid species identified in this study present an opportunity for further mechanistic study to understand the host responses in bacterial or viral infection.
A combination of three lipids achieved a strong area under the receiver operating characteristic (ROC) curve of 0.911 (95% CI 0.81 to 0.98). Similar approaches have been taken using routine laboratory parameters and more recently gene expression where 2-gene transcripts achieved an ROC AUC of 0.95 (95% CI 0.94-1) 11 . The relevance of our data is that they provide the potential for a rapid diagnostic test with which clinicians could distinguish bacterial from viral infection in febrile children.
The study has limitations. Firstly, we were unable to annotate 4 of the 29 discriminatory features, of which two were assigned with only a broad lipid class by identifying the head group (PE). The unknown feature with m/z of 239.157 achieved the second highest AUC for ROC curve analysis on an individual basis. The unknown identity prevents this feature from being a potential marker and hinders biological understanding. This feature, however, was not included in the final 3-lipid panel that gave the highest AUC. Secondly, the sample size in this pilot study is small. Validation studies using quantitative assay are now required to confirm the findings. In addition, in larger validation studies, we will look into the signature of specific pathogens, and potentially co-infection by multiple pathogens.    The DRS was calculated using abundance values from the 3-metabolite signature identified by FS-PLS. Plot A shows points coloured according to the sex of the sample and plot B shows points coloured according to whether the sample was above or below the median age (9 months).