Introduction

Bacterial sepsis represents a serious and frequent complication among preterm infants that causes increased mortality and morbidity.1,2 Furthermore, sepsis caused by low virulent microorganisms such as coagulase-negative Staphylococcus (CONS) has been associated with an increased risk of cerebral palsy and neurodevelopmental impairment.3 The characteristics of the causative agent have prognostic implications. Thus, infections due to Gram-negative microorganisms have higher mortality rates (10–40%) than Gram-positive ones (7–27%).1 Blood culture is the gold standard for the diagnosis of sepsis; however, it does not provide clinicians with reliable results before 36 h. Furthermore, multiplex polymerase chain reaction (PCR) assays for bacterial detection have limitations such as the risk of contamination, lack of information about antibiotic resistance, and high costs that have limited its diagnostic applicability.4 Therefore, rapid diagnosis of sepsis and identification of the causative agent yet remain a challenge.5

Genome-wide expression profiles have been successfully harnessed for the diagnosis of sepsis in preterm infants.6,7,8,9,10 Furthermore, gene expression patterns have been able to differentiate acute infections caused by respiratory viruses and between Gram-positive and Gram-negative bacteria in the pediatric and adult populations.11,12 In this sense, studies in vitro showed a differential expression in neonatal monocytes in response to Staphylococcus epidermidis or Escherichia coli.13 However, these transcriptomic analyses in septic preterm infants have not been yet conducted.

We aimed to assess if there were substantial differences in the transcriptomic profiles of sepsis caused by Gram-positive compared to Gram-negative bacteria in preterm infants.

Patients and methods

Study design and patients’ samples

Prospective, observational, double-cohort, single-center study was conducted at the University and Polytechnic Hospital La Fe (Valencia, Spain). During a 30-month period, very low birth weight infants (VLBW) (<1500 g of gestation) with risk factors and clinical signs of sepsis (Table 1) were screened for recruitment. The Institutional Review Board of our hospital approved the study and all patients had informed consent signed by their parents.

Table 1 Risk factors and clinical signs of sepsis.

VLBW infants without clinical signs of sepsis and paired with cases based on gestational age, birth weight, sex, ethnicity, type of delivery, antenatal steroids, age at diagnosis, and clinical status were enrolled as a control group.

We stored information regarding respiratory and hemodynamic instability, mortality, and specific interventions such as endotracheal intubation, mechanical ventilation, blood transfusions, volume expansion, and/or inotropic support to reflect the severity of sepsis. Patients transfused prior to blood sampling were excluded.

Diagnosis of sepsis

We included both, late- (LOS) and early-onset (EOS) sepsis. We performed blood cultures (BacT/Alert® PF; Biomérieux®, Durham, NC) in all patients with suspected sepsis before starting antibiotics. Two positive blood cultures were required for the diagnosis of CONS sepsis. Cultures obtained from other sites were not accounted for the diagnosis of sepsis.

RNA isolation and microarray hybridization

Venous blood (0.5 mL) was simultaneously obtained with blood culture and routine analytical purposes from cases and matched controls, respectively, and always before the initiation of antibiotics, mixed with 1 mL of RNA-stabilizing solution (TempusTM Blood RNA tubes, Applied Biosystems®, Foster City, CA), and stored at −20 °C until further processing. Total RNA was isolated using the MagMAXTM RNA isolation kit (Ambion/Applied Biosystems, Foster City, CA) according to the manufacturer’s specifications. RNA integrity was assessed using the Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA). Hybridization was performed if the RNA integrity number was ≥5.5.

Transcriptomic studies were performed using one patient sample per GeneChip Human Gene 1.0 ST Array (Affymetrix®, Santa Clara, CA). The array comprised more than 750,000 unique 25-mer oligonucleotide features constituting 33,297 well-annotated genes.

Data analyses and statistics

Raw data (.CEL files) from microarrays were analyzed and statistically filtered using Partek Genomic Suite 6.6 (Partek Inc., St. Louis, MO) software. Input files were normalized with the robust multiple-array average algorithm for gene array on core meta probe sets. Data were analyzed by principal components analysis (PCA) and one-way analysis of variance (ANOVA) was performed with the Partek Genomics Suite across all samples. Gene differences that were statistically significant between the Gram-positive and Gram-negative sepsis groups were identified using a model ANOVA of false discovery rate (FDR) < 0.05.

A partial least-squares discriminant analysis (PLS-DA) was conducted using the PLS Toolbox 8.7 (Eigenvector Research Inc., Wenatchee, WA) running in Matlab 2019b (Mathworks Inc., Natick, MA).14 The multivariate model was cross-validated using venetian blinds to evaluate the accuracy of the classification model. The results of cross-validation were evaluated by the Q2 (R2CV) and R2Y (RMSCV) parameters.15,16

Double cross-validation (2CV) and permutation testing were employed to assess generalization accuracy of PLS-DA models.17,18 In 2CV, a subset of samples is set aside by random k-fold cross-validation (k = 9 in this study) as a validation set and the remaining samples are split again into a train and test subsets by leave-one-out CV for the optimization of the PLS-DA classifier used to predict the test samples. The procedure is repeated until all samples have been included once in the test set, and then an estimate of the PLS-DA discrimination between classes is calculated using the resulting set of predictions. The procedure was repeated five times to average the effect of the initial random k-fold CV on the results. Permutation testing is based on the comparison of the predictive performance of a PLS-DA model using real class assignments to a number of models calculated after random permutation of the class labels (1000 in this study).

RNA microarray probes, which most contributed to class separation, were identified using the PLS-DA Variable Importance in Projections (VIP) scores using a cut-off value of >5 since RNA microarray probes with high VIP scores provide better class separation.19 The receiver-operating characteristic (ROC) curve was calculated for variables included in the predictive model. Finally, the selected differentially expressed genes were imported into Pathway Studio version 12 (Pathway Studio® software, Elsevier® Inc., Rockville, MD) to identify biological processes differing between groups.

Kolmogorov–Smirnov analysis was performed to test the normal distribution of the clinical and demographic data. Continuous variables were expressed as mean ± SD or medians with the interquartile range depending on data distribution. Categorical variables were compared using χ2 or Fisher’s exact test (two-tailed). Two-tailed Student’s t test or Mann–Whitney U test and ANOVA or Kruskal–Wallis were used to compare two or more than two groups as appropriate. Data analysis was performed using SPSS® version 17.0 (SPSS Inc., Chicago, IL). Significance was considered for a P value ≤0.05.

Real-time PCR analyses and statistical analysis

We validated the most significantly and differently overexpressed genes by single real-time reverse-transcription PCR (RT-PCR) using TaqMan® Gene Expression Assays probes (Applied Biosystems®, Foster City, CA) according to the manufacturer’s recommendations.

Ready-to-use primers and probes from the assay-on-demand service of Applied Biosystems were used for the quantification of selected target genes: CD37, CSK, MAN2B2, MGAT1, MOB3A, MYO9B, SH2D3C, and TEP1 (Hs01099648_m1, Hs01062585_m1, Hs00392270_m1, Hs00383009_m1, Hs00926925_m1, Hs00188109_m1, Hs01552509_m1, and Hs00200091_m1, respectively) and endogenous reference gene 18S (Hs99999901_s1). RNA samples were reverse transcribed using random hexamers and MultiScribe reverse transcriptase (Applied Biosystems®, Foster City, CA). After complementary DNA synthesis, RT-PCR was carried out using the QuantStudio 5 Real-Time PCR System (Thermo Fisher Scientific). Samples were run in triplicate, and fold changes were generated for each sample by calculating 2−ΔΔCT.20

Results were expressed as mean ± standard deviation (SD). To compare results between groups, two-tailed Student’s t test or Mann–Whitney tests were used. Data analysis was performed using GraphPad Prism version 8.0.0 for Windows (GraphPad Software, San Diego, CA, www.graphpad.com). Significance was considered for a P value ≤0.05.

Results

Patients’ characteristics and microbiological data

A total of 115 eligible neonates were included. Of those, 35 cases were excluded due to poor RNA integrity, and a total of 29 controls and 51 cases due to suspected sepsis. Of the latter, only 25 had a positive blood culture, and out of these, 17 had a Gram-positive and 8 a Gram-negative causative microorganism.

No statistically significant demographic and clinical differences were observed between patients with Gram-positive and Gram-negative sepsis (Table 2). However, patients with Gram-negative sepsis had a worse clinical progression that required invasive mechanical ventilation, transfusions, and inotropic support more frequently than patients with Gram-positive sepsis (Table 2). None of the patients died.

Table 2 Demographic, clinical, and follow-up characteristics of very low birth weight infants with Gram-positive sepsis (n = 17) and Gram-negative sepsis (n = 8).

The isolated pathogens were: S. epidermidis, n = 11 (44%); S. aureus, n = 4 (16%); E. coli, n = 3 (12%); Enterococcus faecalis, n = 2 (8%); Serratia marcescens, n = 1 (4%); Klebsiella pneumoniae, n = 1 (4%); Klebsiella oxytoca, n = 1 (4%); Enterobacter cloacae, n = 1 (4%); and Morganella morgagnii, n = 1 (4%).

Transcriptomic profile analysis in VLBW infants with Gram-positive and Gram-negative sepsis

Unsupervised PCA showed the distribution of the whole transcriptome of 25 neonates with sepsis: 17 neonates with Gram-positive and 8 Gram-negative cultures (Fig. 1). Pronounced differences in the gene expression pattern are reflected into two clearly distinct clusters based on Gram-positive or Gram-negative sepsis. This was independent of being an EOS or LOS.

Fig. 1: Tridimensional principal component analysis (PCA) mean centering and scaling based on the complete genome.
figure 1

Individual patients are plotted based on their respective positions along the three axes. PCA shows 19 controls (C) in green balloons and 17 sepsis-positive patients: 17 gram-positive sepsis (gram+) in red balloons and 8 gram-negative sepsis (gram–) in purple balloons.

The percentage of variability accounted for by the first three PC was 55.6%. One-way ANOVA was performed between both septic groups and 10,682 differentially expressed genes (FDR < 0.05) were found between Gram-positive and Gram-negative sepsis (44.7% were upregulated genes and 55.3% were downregulated genes).

Predictive model

PLS-DA was conducted. First two latent variables (LVs) were used to build the discriminating model (Fig. 2a). Results depicted in the scores plot and AUROC showed an excellent separation between Gram-positive and Gram-negative samples. The area under the curve (AUC) of ROC was calculated (AUC = 1) with a misclassification mistake of 0%, sensitivity 100%, and specificity 100% (Fig. 2b). In addition, the high values of the model’s goodness-of-fit metrics (R2Y = 0.72, Q2 = 0.73) indicated robustness and reproducibility.

Fig. 2: Discriminative model between gram-positive and gram-negative sepsis.
figure 2

a First two partial least squares components, which explain 46.53% variation of the total, were used to build the discriminating model. Results are presented by a score scatter plot in which an excellent separation was observed between the 17 gram-positive samples (plottered in red squares) and the 8 gram-negative samples (plottered in purple squares). b ROC analysis for discrimination between gram-positive and gram-negative samples constructed using selected genes (blue line: estimated values; green line: cross-validated values; dashed line: ROC = 0.5 and red circles: optimal cut-offs).

Due to the small sample size, the use of an external validation set for model validation was not feasible. Alternatively, a statistical validation (2CV) was performed, providing external figures of merit for an objective assessment of generalization accuracy. The 2CV-PLS-DA predicted values for the classification of samples collected from Gram-positive and Gram-negative patients. The statistical significance of class separations obtained after 2CV was assessed by comparing the distributions of the quality parameters “number of misclassifications”, “Q2,” and “area under the ROC curve” (AUROC) values obtained using real class assignments to those from re-estimations after permutation testing (Supplementary file S1). Scrambled models provided significantly worse figures of merit than the original model, supported by calculated P values ≤0.02, and hence, class separation due to chance correlations can be ruled out.

The most contributing variables to the group separation were identified using the VIP score values (Supplementary file S2). A total of 23 microarray probes corresponding to 21 genes with VIP scores >5 were selected and are summarized in Table 3. One of the probes did not have an assigned gene and the probe CCDC72 was represented twice. Table 3 shows relevant information as the gene symbol, the gene common name, the P value of ANOVA statistical comparisons between both groups, and fold change of these selected genes.

Table 3 List of 21 genes derived from the predictive model resulting from genome-wide expression profile analysis in VLBW infants with sepsis caused by Gram-positive versus Gram-negative bacteria.

RT-PCR validation

The most significantly overexpressed genes from a statistical and biological point of view in Gram-positive sepsis, which included CD37, CSK, MAN2B2, MGAT1, MOB3A, MYO9B, SH2D3C, and TEP1, were validated by RT-PCR compared to Gram-negative sepsis (Fig. 3).

Fig. 3: Boxplots represent mean and error standard of mRNA expression (2-ΔΔCt) in gram-positive and gram-negative sepsis for each gene.
figure 3

CD 37 molecule (CD 37), C-src tyrosine kinase (CSK), Mannosidase, alpha, class 2B, member 2 (MAN2B2), Mannosyl (alpha-1,3-)-glycoprotein beta-1,2-Nacetylglucosamin (MGAT1), MOB1, Mps One Binder kinase activator-like 2A (yeast) (MOB3A), Myosin IXB (MYO9B), SH2 domain containing 3C (SH2D3C) and Telomerase-associated protein 1 (TEP1). P-value was calculated using Student t-test.

Biological analysis

The most significantly differentially overexpressed genes were integrated into a biological analysis using Pathway Studio software. Following the one-way ANOVA with 10,682 differentially expressed genes (FDR < 0.05) between Gram-positive and Gram-negative sepsis, we discarded genes present in any contrast with controls, and finally, 719 genes were used for the biological analysis.

The most relevant biological processes are summarized in Table 4. In Fig. 4, we have linked our overexpressed genes with the differential biological processes. Cytokines are represented as a node that regulates many relevant biological processes described in our study: inflammatory response, immune response, cell death, apoptosis, cell migration, cell differentiation, and metabolic response, among others. Direct relationships were not found in two genes (MAN2B2 and MOB3A).

Table 4 The most relevant biological processes filtered by enrichment P value and number of genes that are involved in each signaling or pathway.
Fig. 4: Model explaining changes of overexpressed genes in Gram-positive versus Gram-negative neonates (in red) linked with significant biological processes shown in Table 4.
figure 4

Solid line indicates expression and dashed line indicates regulation. Relations are colored by effect: red color expresses a negative effect and green color expresses a positive effect. In gray we can observe unknown effect.

Discussion

Neonatal sepsis is a life-threatening condition with a special impact in very preterm neonates with an immature immune system.2 Previously, we described a gene signature of sepsis in VLBW infants.6 Here, we strive to identify a differential pattern able to discriminate between Gram-positive and Gram-negative bacteria in VLBW septic infants. Access to reliable biomarkers would allow targeting a specific therapy. This individualized approach could avoid the unnecessary use of antibiotics, its side effects, and reduce induced resistance.21

To our knowledge, this is the first study that highlights differential transcriptomic profiles between VLBW infants diagnosed with Gram-positive and Gram-negative sepsis. Hilgendorff et al.9 in a transcriptomic study conducted in preterm infants with EOS described two subclasses of sepsis with differentially expressed genes involved in T cell proliferation, neutrophil activation, natural killer cell activation, hypoxia-induced signaling, and carbohydrate metabolism. Nevertheless, authors defined sepsis only by clinical and analytical criteria and no microbiologic data are provided to determine if differences were related to the type of microorganism.

We included as many early as late sepsis in both groups to avoid the effect of postnatal age in the response to sepsis previously described by Wynn et al.10 They found significant differences between gene expression in early and late sepsis, although the top five canonical pathways were identical. However, they did not report the isolated microorganisms in both groups of sepsis, which could explain the differences in gene expression between them.

Our results have shown that 10,682 genes changed depending on the Gram staining of the causal agent. Furthermore, we identified eight overexpressed genes that differentiated two groups of septic VLBW infants, specifically CD37, CSK, TEP1, MGAT1, SH2D3C, MYO9B, MAN2B2, and MOB3A. Several of the overexpressed genes in Gram-positive sepsis such as CD37, CSK, MGAT1, SH2D3C are involved in metabolic processes and immune and inflammatory responses and could explain the better clinical outcomes.

Thus, CD37 is a leukocyte-specific protein of the tetraspanin superfamily implicated in the humoral and cellular immune responses. Its underexpression has been associated with a poor immune response.22 We found an overexpression of CD37 in Gram-positive sepsis. Overexpression of CD37 could be involved in a specific mechanism for cellular activation and differentiation similar to that occurring in Candida albicans infections.23 Conversely, underexpression of CD37 could be associated with worse clinical outcome occurring in sepsis caused by Gram-negative bacteria in VLBW infants. Another discriminant gene is CSK, which plays a significant role in the immune response promoting cell differentiation and migration.24,25,26,27,28 Chow et al. and Veillette24 showed that the balance between Src-related kinases and the p50csk may be a major determinant of the immune response. Thus, CSK could be helpful to evaluate the degree of aggressiveness of the infection and discriminate between Gram-positive and Gram-negative sepsis. This hypothesis has already been corroborated in a Csk-deficient model29,30 in which the immune response could be blunted.

Moreover, both genes, CD37 and CSK, have been implied in cell death inhibition31,32,33 and their overexpression in VLBW infants with Gram-positive sepsis could constitute a protective factor. Furthermore, the overexpressed MAGT1 gene (alpha-1,3-mannosyl-glycoprotein 2-beta-N-acetylglucosaminyltransferase) in Gram-positive septic neonates implies the triggering of an effective cell migration, inflammatory response, and immune response.34,35 In addition, SH2D3C (known as NSP3) is a MAP kinase substrate that plays an important role as a regulator of lymphocyte adhesion in the integrin activation pathway. The overexpression of this gene, which is involved in the apoptotic process,36 could imply a better response against infection in the Gram-positive group.

Cytokines have a crucial role in the complex pathophysiology underlying sepsis.6 They are important mediators in the metabolic response to stress or infection and efficiently coordinate the defense mechanisms against invading pathogens.37,38,39,40,41

In the present study, we have evidenced a different host response elicited by Gram-positive or Gram-negative bacteria. Our results show that VLBW infants with sepsis caused by Gram-positive bacteria activate metabolic and immunomodulating responses that translate into a balanced clinical pro- and-anti-inflammatory response. The overexpression of genes such as CD37 and CSK effectively attenuates cytokine production and cell death in Gram-positive as compared to Gram-negative infection. The attenuation of the inflammatory response mediated by cytokines in infections caused by Gram-positive bacteria could explain the lower incidence of septic shock and subsequent mortality.

Our study has some limitations. The sample size was small; however, we were able to find different signature between groups. We validated the significant genes with a RT-PCR and also with a mathematic model; nevertheless, further validation in independent cohorts would be interesting to ensure the reproducibility of these results.

In addition, we found a higher proportion of EOS in the Gram-negative group. However, this difference did not reach statistical significance. In previous studies, postnatal age could seem to have influence in the host response to sepsis.7,10 However, in our study after performing PCA, patient’s samples clustered as being Gram-negative and Gram-positive. Importantly, EOS and LOS were indistinctly identified within the two main Gram clusters. Therefore, we attributed the gene expression pattern differences to the causative agent and not to the timing of sepsis diagnosis.

We conclude that the transcriptomic analysis may be helpful to differentiate between the Gram-positive and Gram-negative infections. In addition, it may also help to design more specifically targeted, personalized, and effective therapies.