A community approach to mortality prediction in sepsis via gene expression analysis

Sweeney, Timothy E.; Perumal, Thanneer M.; Henao, Ricardo; Nichols, Marshall; Howrylak, Judith A.; Choi, Augustine M.; Bermejo-Martin, Jesús F.; Almansa, Raquel; Tamayo, Eduardo; Davenport, Emma E.; Burnham, Katie L.; Hinds, Charles J.; Knight, Julian C.; Woods, Christopher W.; Kingsmore, Stephen F.; Ginsburg, Geoffrey S.; Wong, Hector R.; Parnell, Grant P.; Tang, Benjamin; Moldawer, Lyle L.; Moore, Frederick E.; Omberg, Larsson; Khatri, Purvesh; Tsalik, Ephraim L.; Mangravite, Lara M.; Langley, Raymond J.

doi:10.1038/s41467-018-03078-2

Download PDF

Article
Open access
Published: 15 February 2018

A community approach to mortality prediction in sepsis via gene expression analysis

Timothy E. Sweeney ORCID: orcid.org/0000-0002-3596-1093^1,2^na1^nAff25,
Thanneer M. Perumal ORCID: orcid.org/0000-0003-1168-8982³^na1,
Ricardo Henao ORCID: orcid.org/0000-0003-4980-845X^4,5,
Marshall Nichols⁴,
Judith A. Howrylak⁶,
Augustine M. Choi⁷,
Jesús F. Bermejo-Martin⁸,
Raquel Almansa⁸,
Eduardo Tamayo⁸,
Emma E. Davenport ORCID: orcid.org/0000-0002-8768-346X^9,10,11,
Katie L. Burnham¹²,
Charles J. Hinds ORCID: orcid.org/0000-0001-5094-8324¹³,
Julian C. Knight¹²,
Christopher W. Woods^4,14,15,
Stephen F. Kingsmore¹⁶,
Geoffrey S. Ginsburg⁴,
Hector R. Wong^17,18,
Grant P. Parnell¹⁹,
Benjamin Tang^19,20,21,22,
Lyle L. Moldawer²³,
Frederick E. Moore²³,
Larsson Omberg³,
Purvesh Khatri ORCID: orcid.org/0000-0002-4143-4708^1,2^na2,
Ephraim L. Tsalik ORCID: orcid.org/0000-0002-6417-2042^4,14,15^na2,
Lara M. Mangravite ORCID: orcid.org/0000-0001-7841-3612³^na2 &
…
Raymond J. Langley ORCID: orcid.org/0000-0001-8849-9325²⁴

Nature Communications volume 9, Article number: 694 (2018) Cite this article

12k Accesses
132 Citations
74 Altmetric
Metrics details

Subjects

Abstract

Improved risk stratification and prognosis prediction in sepsis is a critical unmet need. Clinical severity scores and available assays such as blood lactate reflect global illness severity with suboptimal performance, and do not specifically reveal the underlying dysregulation of sepsis. Here, we present prognostic models for 30-day mortality generated independently by three scientific groups by using 12 discovery cohorts containing transcriptomic data collected from primarily community-onset sepsis patients. Predictive performance is validated in five cohorts of community-onset sepsis patients in which the models show summary AUROCs ranging from 0.765–0.89. Similar performance is observed in four cohorts of hospital-acquired sepsis. Combining the new gene-expression-based prognostic models with prior clinical severity scores leads to significant improvement in prediction of 30-day mortality as measured via AUROC and net reclassification improvement index These models provide an opportunity to develop molecular bedside tests that may improve risk stratification and mortality prediction in patients with sepsis.

Prognostic and predictive enrichment in sepsis

Article 11 September 2019

Natalja L. Stanski & Hector R. Wong

Development of macrophage-associated genes prognostic signature predicts clinical outcome and immune infiltration for sepsis

Article Open access 23 January 2024

Guangxin Ma, Xiaolin Wu, … Fengtao Zhang

Estimates of gene ensemble noise highlight critical pathways and predict disease severity in H1N1, COVID-19 and mortality in sepsis patients

Article Open access 24 May 2021

Tristan V. de Jong, Victor Guryev & Yuri M. Moshkin

Introduction

Sepsis, recently defined as organ dysfunction caused by a dysregulated host response to infection¹, contributes to half of all in-hospital deaths in the US and is the leading cost for the US healthcare system^2,3. Although in-hospital sepsis outcomes have improved over the last decade with standardized sepsis care, mortality rates remain high (10–35%)⁴. Sepsis treatment still focuses on general management strategies including source control, antibiotics, and supportive care. Despite dozens of clinical trials, no treatment specific for sepsis has been successfully utilized in clinical practice⁵. Two consensus papers suggest that continued failure of proposed sepsis therapies is due to substantial patient heterogeneity in the sepsis syndrome and a lack of tools to accurately categorize sepsis at the molecular level^5,6. Current tools for risk stratification include clinical severity scores such as APACHE or SOFA as well as blood lactate levels. While these measures assess overall illness severity, they do not adequately quantify the patient’s dysregulated response to the infection and therefore fail to achieve the personalization necessary to improve sepsis care⁷. Some peptide markers of sepsis severity have been validated (e.g. proadrenomedullin⁸ among others⁹), but these are not yet cleared for clinical use.

A molecular definition of the severity of the host response in sepsis would provide several benefits. First, improved accuracy in sepsis prognosis would improve clinical care through appropriate matching of patients with resources: the very sick can be diverted to intensive care unit (ICU) for maximal intervention, while patients predicted to have a better outcome may be safely watched in the hospital ward or discharged early. Second, more-precise estimates of prognosis would allow for better discussions regarding patient preferences and the utility of aggressive interventions. Third, better molecular phenotyping of sepsis patients has the potential to improve clinical trials through both (1) patient selection and prognostic enrichment for drugs and interventions (e.g., excluding patients predicted to have good vs. bad outcomes) and (2) better assessments of observed-to-expected ratios for mortality^5,6. Finally, as a direct quantitative measure of the dysregulation of the host response, molecular biomarkers could potentially help form a quantitative diagnosis of sepsis as distinct from non-septic acute infections^10,11. Thus, overall, a quantitative test for sepsis could be a significant asset to clinicians if deployed as a rapid assay.

Previously, a number of studies have used whole-blood transcriptomic (genome-wide expression) profiling to risk-stratify sepsis patients^12,13,14,15. Important insights from these studies suggest that more-severe sepsis is accompanied by an overexpression of neutrophil proteases, adaptive immune exhaustion, and an overall profound immune dysregulation^{12,13,16,17,18,19}. Quantitative evaluation of host response profiles based on these observations has been validated prospectively to show specific outcomes^14,15, but none have yet been translated into clinical practice. Still, the availability of high-dimensional transcriptomic data from these accumulated studies has created unprecedented opportunities to address questions across heterogeneous representations of sepsis (different ages, pathogens, and patient types) that could not be answered by any individual cohort.

Transcription-based modeling has been deployed across many diseases to improve prognostic accuracy. These are typically developed in a method-specific manner using data collected from single cohorts. As a result, prognostic models often lack the generalizability that is necessary to confer utility in clinical applications²⁰. In contrast, community modeling approaches (where multiple groups create models using the same training data) can provide an opportunity to explicitly evaluate predictive performance across a diverse collection of prognostic models sampled from across a broad solution space^{21,22,23,24,25}. Here, we systematically identified a large collection of both public and privately held gene expression data from clinical sepsis studies at the time of sepsis diagnosis. Three scientific groups were then invited to build models to predict 30-day mortality based on gene expression profiles. These three groups produced four different prognostic models, which were then validated in external validation cohorts composed of patients with either community-acquired sepsis or hospital-acquired infections (HAIs).

Results

Analysis overview

We used a community approach to build gene-expression-based models predictive of sepsis-induced mortality using all available gene expression datasets (21 total cohorts, Table 1). In this community approach, three different teams (Duke University, Sage Bionetworks, and Stanford University) performed separate analyses using the same input data; we thus sampled the possible model space to determine whether output performance is a function of analytical approaches (Fig. 1). Two models (Duke and Stanford) used parameter-free difference-of-means formula for integrating gene expression, and the other two models (both from Sage Bionetworks) used parametrized penalized logistic regression (LR)²⁶ and random forests (RF)²⁷.

Table 1 Datasets included in the analysis

Full size table

Each of the four models was trained using 12 discovery cohorts (485 survivors and 157 non-survivors) composed primarily of patients with community-acquired sepsis. Performance was evaluated across two groups of heterogeneous validation datasets (five community-acquired sepsis cohorts with 161 survivors and 28 non-survivors and 4 HAI cohorts with 258 survivors and 24 non-survivors, Table 1). The community-acquired sepsis and HAI cohorts were considered separately in validation because of their known differences in host-response profiles. Due to the nature of public datasets, we had limited information on demographics, infection, severity, and treatment and so these variables were not controlled for in model selection. The cohorts included patients from multiple age groups, countries, and hospital wards (emergency department, hospital ward, medical ICU, and surgical/trauma ICU). As expected in varied patient populations, mortality rates varied widely across cohorts (mean 23.2% ± 13.4%).

Prognostic power assessments

Model performance was primarily evaluated using receiver operating characteristic (ROC) analysis separately in the discovery, validation, and HAI cohorts. Boxplots of the AUROCs for each model are shown in Fig. 2; data from individual cohorts and summary ROC curves are shown in Supplementary Tables 1 and 2 and Supplementary Fig. 4. Across the five community-acquired sepsis validation datasets, the four models showed generally preserved prognostic power, with summary AUROCs ranging from 0.75 (95% CI 0.63–0.84, Sage LR) to 0.89 (95% CI 0.56–0.99, Stanford). Three of the four models performed well in classifying the HAI datasets (summary AUROCs 0.81–0.87 in the Duke, Sage LR, and Stanford models); one model performed poorly in HAI (summary AUROC 0.52, 95% CI 0.36–0.68, Sage RF). Overall, most models performed equivalently in discovery, validation, and HAI datasets. To judge other performance metrics including accuracy, specificity, negative predictive value, and positive predictive value, we set thresholds for each model at the nearest sensitivity >90% (Supplementary Fig. 5). The raw prediction scores for each sample in each model are available for further interpretation²⁸.

Using the validation and HAI cohorts, we compared the present models to a single prognostic model made with all genes previously associated with mortality (see Supplementary Methods)^{13,17,18,19,29,30}. We found that that three of the four models show substantial improvement (average increase of roughly 0.1) compared to the prior models; this reached significance for the Duke and Stanford models (Supplementary Table 3).

To assess whether the models contained complementary orthogonal information, we evaluated the prediction accuracy of an ensemble model based on the predictions of all four individual models (see Supplementary Methods). The prognostic power of the ensemble model was at an average AUROC of 0.81 across all five validation datasets (paired t-tests vs. individual models all P = NS, Supplementary Table 4) indicating that the present diagnostic accuracy may be a rough estimate of the ceiling of prognostic accuracy inherent in these data.

Performance in predicting non-survivors was evaluated using the area under the precision–recall curve (AUPRC)³¹ (Fig. 2b and Supplementary Table 5). The AUPRCs for non-survivor prediction were notably lower than the AUROCs, as was expected from the highly unbalanced classes (rare mortalities). This suggests that the models’ primary utility may be in ruling out mortality for individuals much less likely to die within 30 days (those less likely to require substantial intervention) as opposed to accurately identifying the minority of patients who are highly likely to die within 30 days. On the contrary the AUPRC of the ensemble model was averaged at 0.428 in validation cohorts (Supplementary Table 4), suggesting complementarity in discriminatory power between individual models.

We examined the effects of clinical time course on the gene scores in the two validation datasets that tracked longitudinal data (GSE21802 and GSE54154; Supplementary Fig. 6). We found no differences in slope (change in score over time) between the survivors and non-survivors, although the scores in non-survivors were significantly higher than in survivors during the entire hospital stay, possibly indicating a failure to restore homeostasis.

Comparison to standard predictors

We next assessed whether the performance of these gene expression-based predictors of mortality outperformed standard clinical severity scores. Notably, clinical measures of severity were only available in a subset of cohorts (eight discovery, three validation, three HAI; Table 2). The mean differences in the AUROCs of the gene model over clinical severity scores were: Duke −0.044; Sage LR 0.010; Sage RF 0.094; Stanford 0.064; only the Stanford model trended towards significance (paired t-test P = 0.098). However, we combined gene models and clinical severity scores into joint predictors, and each combination significantly outperformed clinical severity scores alone (mean difference Duke 0.077; Sage LR 0.076; Sage RF 0.16; Stanford 0.098; all paired t-tests p ≤ 0.01).

Table 2 AUROC with genomic features and clinical severity

Full size table

We next examined continuous net reclassification improvement (cNRI) index to quantify how well the model with gene scores reclassifies survivors over the model with clinical severity scores in each of these same datasets (Table 3). In the validation and HAI cohorts, the mean NRI was 0.53–0.84 (potential range 0–2, where 2 reflects all patients with improved classification). For the Duke and Stanford scores, half of the validation and HAI datasets showed significant NRI compared to standard predictors alone. This suggests that the gene expression-based predictors add significant prognostic utility to standard clinical metrics.

Table 3 Continuous net reclassification index for gene scores over clinical severity scores

Full size table

Finally, we examined test characteristics at a high-sensitivity cutoff (95%) and a high-specificity cutoff (95%) for the gene scores in comparison to baseline error models (Supplementary Table 6) and in comparison to clinical severity scores (Supplementary Table 7). Overall mean accuracy of the joint clinical and gene scores was higher in the validation and HAI datasets (0.58–0.72 and 0.64–0.79 across the models, respectively) compared to clinical scores alone (0.57 and 0.62, respectively).

Comparison across models

We next studied whether models were correctly classifying the same patients or different groups of patients. We tested model correlations across all patients by comparing the relative ranks of each patient within each model instead of comparing raw model scores. We found the models were moderately correlated (Spearman rho = 0.35 – 0.61, Supplementary Fig. 7). We then evaluated the agreement between the four models by comparing model-specific patient classifications (Supplementary Table 8). For this purpose, we chose cutoffs for each model that yielded 90% sensitivities for non-survivors. We then labeled patients as being either always misclassified, correctly classified by 1 or 2 models (no consensus), or correctly classified in at least 3 of 4 models (consensus). As expected by the 90% sensitivity threshold, 10% of patients were misclassified by all models. In the remaining cases, 63% were correctly predicted by consensus and 27% do not reach consensus. Together, the model correlation and consensus analyses showed that 73% of patients were classified by at least one model, with variance leading to discordance in the remaining 27%. These results suggest that although the models use different genes, they are reaching the same conclusions about most patients.

Biology of the gene signatures of mortality

Gene predictors were chosen for both optimized prognostic power and sparsity in our data-driven approach and so do not necessarily represent key nodes in the pathophysiology of sepsis. Still, we examined whether interesting biology was being represented in the models. We first looked for overlap in the gene sets used for prediction across the four models, but found few genes in common (Table 4). Since each signature had too few genes for robust analysis, we analyzed the genes from all four models in aggregate, resulting in 58 total genes (31 upregulated and 27 downregulated; Supplementary Table 9).

Table 4 Genomic predictors of sepsis mortality

Full size table

First, we studied whether the differential gene expression identified may be indicative of cell-type shifts in the blood. The pooled gene sets were tested in several known in vitro gene expression profiles of sorted cell types to assess whether gene expression changes are due to cell-type enrichment (Supplementary Fig. 8). No significant differences were found, but the trend showed an enrichment of M1-polarized macrophages and band cells (immature neutrophils), and underexpression in dendritic cells. This is consistent with a heightened pro-inflammatory response and a decrease in adaptive immunity in patients who ultimately progress to mortality¹².

We next tested the 58 genes for enrichment in curated gene sets from gene ontologies, Reactome and KEGG pathways using two different enrichment methodologies: gene-based over-representation analysis and expression-based GSEA. After multiple hypothesis testing corrections, 4 out of 3330 gene sets tested were significantly over-represented at an FDR of 5% (Supplementary Table 10a). These include genes related to the regulation of T cell activation and proliferation, cytokine-mediated signaling pathway and RHO GTPases activation of CIT. The relatively low number of pathways enriched in over-representation analysis may be due to the low number of genes in the predictor set. Enrichment of 58 gene predictors’ expression were also tested using GSEA. Out of 1576 curated pathways, 546 were enriched at an FDR of 5%; top pathways are shown in Supplementary Table 10b. A brief examination of enriched pathways activated in non-survivors showed mostly inflammation-related pathways, while survivors showed largely developmental pathways. Since the models were generated in a way that penalized the inclusion of genes that were redundant for classification purposes, and since genes redundant for classification purposes are often from the same biological pathway, their exclusion from the models limits the utility of enrichment analyses.

Discussion

Sepsis is a heterogeneous disease, including a wide possible range of patient conditions, pre-existing comorbidities, severity levels, infection incubation times, and underlying immune states. Many investigators have hypothesized that molecular profiling of the host response may better predict sepsis outcomes. Here, we extensively assessed the predictive performance of whole-blood gene expression using a community-based modeling approach. This approach was designed to evaluate predictive capabilities in a manner that was independent of specific methodological preferences, and instead created robust prognostic models across a broad solution space. We developed four state-of-the-art data-driven prognostic models using a comprehensive survey of available data including 21 different sepsis cohorts (both community-acquired and hospital-acquired, N = 1113 patients), with summary AUROCs around 0.85 for predicting 30-day mortality. We also showed that combining the gene-expression-based models with clinical severity scores leads to significant improvement in the ability to predict 30-day mortality, indicating clinical utility.

Prediction of outcomes up to 30 days after the time of sampling represents a difficult task, given that the models must account for all interventions that occur as part of the disease course. An accuracy of 100% is likely not only unachievable but also undesirable, as it would suggest that mortality is pre-determined and independent of clinical care. Given this background, and since similar prognostic power was observed across all individual models and the ensemble model, our prognostic accuracy may represent an upper bound on transcriptomic-based prediction of sepsis outcomes. In addition, since prognostic accuracy was retained across broad clinical phenotypes (children and adults, with bacterial and viral sepsis, with community-acquired and HAIs, from multiple institutions around the world) the models appear to have successfully incorporated the broad clinical heterogeneity of sepsis. The derived discriminatory power of the gene models (AUCs near 0.85) is at least similar to the AUC of proadrenomedullin (0.83) in a recent large prospective trial (TRIAGE study)⁸. Furthermore, the impact of the addition of the severity score to clinical practice could be substantial. If envisioned as a rule-out test for mortality (e.g. setting the threshold at a 95% sensitivity), the Duke and Stanford scores showed large increases in specificity (13–21 percentage point absolute increase) compared with standard clinical severity scores alone. However, peptide assays have the significant advantage of potentially very rapid turnaround times. Moreover, a paucity of randomized data in application of existing biomarkers makes it unclear whether improved risk stratification will actually improve health and/or reduce costs⁹.

Sepsis remains difficult to define. The most recent definition of sepsis (Sepsis-3) requires the presence organ dysfunction as measured by an increase in SOFA of two or more points over baseline¹. Determining the SOFA score can help guide which organ systems are dysfunctional, but this fails to characterize the biological changes are driving the septic response. Molecular tools like the ones developed here provide an opportunity to provide a simple, informative prognosis for sepsis by improving patient risk stratification. Host-response profiles could also help to classify patients with sepsis as opposed to non-septic acute infections. Identifying such high-risk patients may also lead to greater success in clinical trials through improved enrichment strategies. This identification of subgroups or ‘endotypes’ of sepsis has already been successfully applied to both pediatric and adult sepsis populations^14,15.

The goal of this study was to generate predictive models but not necessarily to define sepsis pathophysiology. However, our community approach identified a large number of genes associated with sepsis mortality that may point to underlying biology. The association with immature neutrophils and inflammation in sepsis has been previously shown³². Results of this study confirm this finding as we note increases in the neutrophil chemoattractant IL-8 as well as neutrophil-related antimicrobial proteins (DEFA4, BPI, CTSG, MPO). These azurophilic granule proteases may indicate the presence of very immature neutrophils (metamyelocytes) in the blood³³. Many of these genes have also been noted in the activation of neutrophil extracellular traps (NETs)^34,35. NET activation leads to NETosis, a form of neutrophil cell death that can harm the host³⁵. Whether these involved genes are themselves harmful or are markers of a broader pathway is unknown. Along with immune-related changes, there are changes in genes related to hypoxia and energy metabolism (HIF1A, NDUFV2, TRIB1). Of particular interest is the increase in HIF1A, a hypoxia-induced transcription factor. This upregulation is corroborated by previous findings in patients with higher early mortality in the larger E-MTAB-4421.51 cohort¹³. This may be evidence of either a worsening cytopathic hypoxia in septic patients who progress to mortality, or of a shift away from oxidative metabolism (“pseudo-Warburg” effect), or both³⁶. Modification of the Warburg effect due to sepsis has been implicated in immune activation³⁷, trained immunity³⁸, and immunoparalysis³⁹.

The present study has several limitations. First, as a retrospective study of primarily publically available data, we are not able to control for demographics, infection, patient severity, or individual treatment. However, our successful representation of this heterogeneity likely contributed to the successful validation in external community-acquired and hospital-acquired sepsis cohorts. Second, despite a large amount of validation data, we do not present the results of any prospective clinical studies of these biomarkers. Prospective analysis will be paramount in translating the test to a clinically relevant assay. In addition, while some rapid PCR techniques could bring the potential turnaround time of a gene-expression-based assay to under 30 min, this will require a substantial engineering effort. Third, the genes identified here were specifically chosen for their performance as biomarkers, not based on known relevance to the underlying pathophysiology of mortality in sepsis. As such, the biological insights gained from these biomarkers will need to be confirmed and expanded on by studies focused on the entire perturbation of the transcriptome during sepsis and through targeted study of individual genes and pathways. Fourth, the use of 30-day mortality as our endpoint is a crude measure of severity, and may miss important intermediate endpoints such as prolonged ICU stay or poor functional recovery. While such intermediate outcomes were not available in the current data, the models’ abilities to predict these functional outcomes will need to be tested prospectively. Fifth, despite a seemingly large total N (1113), we were unable to perform robust subgroup analyses (such as infection site or pathogen type), although a broad range of clinical circumstances is included across the datasets. Finally, we note that some may find as a weakness the limited overlap in genes chosen by the four models. However, in the search for sparse models using highly collinear data such as gene expression, near-random selection of variables can occur⁴⁰. The similar performance of the classifiers using disparate gene sets is thus further evidence that these models may be near an upper bound of discriminatory ability using whole-blood gene expression data.

Researchers, clinicians, funding agencies, and the public are all advocating for improved platforms and policies that encourage sharing of clinical trial data⁴¹. Meta-analysis of multiple studies leads to results that are more reproducible than from similarly powered individual cohorts⁴². The community approach used here has shown that aggregated transcriptomic data can be used to define novel prognostic models in sepsis. This collaboration of multidisciplinary teams of experts encompassed both analytical and statistical rigor along with deep understandings of both the transcriptomics data and clinical data. To advance beyond the work presented here, more data must be made available, including demographics, treatments, and clinical outcomes, as well as other data types like proteomics and metabolomics. Data-driven collaborative modeling approaches using these data can be effective in discovering new clinical tools.

We have shown comprehensively that patients with sepsis can be risk-stratified based on their gene expression profiles at the time of diagnosis. The overall performance of expression-based predictors paired with clinical severity scores was significantly higher than clinical scores alone in multiple cohorts with heterogeneous sepsis. These gene expression models reflect a patient’s underlying biological response state and could potentially serve as a valuable clinical assay for prognosis and for defining the host dysfunction responsible for sepsis. These results serve as a benchmark for future prognostic model development and as a rich source of information that can be mined for additional insights. Improved methods for risk stratification would allow for better resource allocation in hospitals and for prognostic enrichment in clinical trials of sepsis interventions (removing those patients who will likely survive regardless of intervention). Ultimately, prospective clinical trials will be needed to confirm and extend the findings presented here.

Methods

Systematic search

Two public gene expression repositories (NCBI GEO, EMBL-EBI ArrayExpress) were searched for all clinical-gene expression microarray or next-generation sequencing (NGS/RNAseq) datasets that matched any of the following search terms: sepsis, SIRS, trauma, shock, surgery, infection, pneumonia, critical, ICU, inflammatory, nosocomial. Clinical studies of acute infection and/or sepsis using whole blood were retained. Datasets that utilized endotoxin or lipopolysaccharide infusion as a model for inflammation or sepsis were excluded. Datasets derived from sorted cells (e.g., monocytes, neutrophils) were also excluded.

Overall, 16 studies containing 17 different cohorts were included (Table 1a, b). These 16 studies include expression profiles from both adult^{15,17,19,43,44,45,46,47,48,49,50,51,52} and pediatric^{48,53,54,55,56} cohorts. In these cases, the gene expression data were publicly available. When mortality and severity phenotypes were unavailable in the public data, the data contributors were contacted for this information. This included datasets E-MTAB-1548 (refs. ^13,57), GSE10474 (ref. ⁴⁴), GSE21802 (ref. ⁵⁰), GSE32707 (ref. ⁴⁷), GSE33341 (ref. ⁵¹), GSE63042 (ref. ¹⁹), GSE63990 (ref. ⁵²), GSE66099 (ref. ⁵⁶), and GSE66890 (ref. ⁴⁹). Furthermore, where longitudinal data were available for patients admitted with sepsis, we only included data derived from the first 48 h after admission. The E-MTAB-4421 and E-MTAB-4451 cohorts both came from the GAinS study¹⁵, used the same inclusion/exclusion criteria, and were processed on the same microarray type. Thus, after re-normalizing from raw data, we used ComBat normalization⁵⁸ to co-normalize these two cohorts into a single cohort, which we refer to as E-MTAB-4421.51. For this study, data were included only for patients sampled on the day of hospital admission. In addition to the above 17 datasets, we identified four additional privately held datasets (Table 1c) representing patients with HAI. In-depth summaries of each HAI cohort can be found in the supplementary text.

We selected cohorts as either discovery or validation based on their availability. Studies for which outcome data were readily available were included as discovery cohorts. Only GSE54514 (ref. ¹⁷) was initially held out for validation given its large size and representative patient characteristics. After we had trained the models some outcomes data became newly available, so these were added as validation cohorts^15,50,51,52. Additionally, given the known differences in sepsis pathophysiology and gene expression profiles as compared to patients with community-acquired sepsis^56,59, the HAI datasets were set aside as a second validation cohort. The validation cohorts were not matched to the discovery cohort on any particular criteria but rather provide a validation opportunity across a heterogeneous range of clinical scenarios.

Gene expression normalization

All Affymetrix datasets were downloaded as CEL files and re-normalized using the gcRMA method (R package affy⁶⁰). Output from other array types were normal-exponential background corrected and then between-arrays quantile normalized (R package limma⁶¹). For all gene analyses, the mean of probes for common genes was set as the gene expression level. All probe-to-gene mappings were downloaded from GEO from the most current SOFT files.

Two of the cohorts, CAPSOD¹⁹ and the Duke HAI cohort, were assayed via NGS. For compatibility with microarray studies, expression from NGS datasets were downloaded as counts per million total reads (CPM) and were normalized using a weighted linear regression model using the voom method⁶² (R package limma⁶¹). The estimated precision weights of each observation were then multiplied with the corresponding log2(CPM) to yield final gene expression values.

Prediction models

Prediction models were built by comparing patients who died within 30 days of hospital admission with sepsis to patients who did not. In the CAPSOD dataset (which was used in model training) we excluded two patients with unclear mortality outcomes, and one patient who died in-hospital but after 30 days. Mortality was modeled as a binary variable as since time-to-event data were not available. Overall, a total of four prognostic models were built by three different academic groups (Duke University, Sage Bionetworks, and Stanford University). All four models started with the same gene expression data in the discovery phase. Each model was built in two phases: a feature selection phase based on statistical thresholds for differential gene expression across all discovery cohorts, and then a model construction phase optimizing classification power. Full descriptions of the four models can be found in the supplementary text and in Supplementary Figs. 1–3.

Comparison with severity scores

We compared the prognostic accuracy of the gene scores with the prognostic accuracy of clinical severity scores (APACHE II, PELOD, PRISM, SAPS II, SOFA, and the Denver score) where such information was available. No datasets had more than one clinical severity score type available. These clinical severity scores were not necessarily built to predict mortality in the specific populations in which they were used here, but nonetheless serve as important comparators for the gene expression models. To compare prognostic power in the datasets which included subject-level severity data, LR was performed to predict mortality using either the clinical severity score or the given gene model’s output score. We then tested a joint model (mortality as a function of clinical severity and gene score, without interaction term) and measured the AUROC of the combined model. Comparisons were made between AUROCs with paired t-tests. We further computed cNRI index to quantify how well our joint model reclassifies over clinical severity scores alone⁶³. The cNRI is the sum of two scores: the improvement in classification of a positive event (here, mortality) by the tested model, plus the improvement in classification of a negative event (here, survival) by the tested model. Each improvement has a possible range of [−1, 1], so the full cNRI has a possible range of [−2, 2]. A score of −2 would mean that every prediction is made worse by the addition of the tested model; a score of 2 means that every prediction is made more accurate by the addition of the tested model. Finally, we calculated test characteristics at both a high-sensitivity cutoff and a high-specificity cutoff, for both clinical scores and gene scores separately, and for the joint clinical-gene models. These are reported as mean ± standard deviation across datasets in summary tables.

Discriminatory power analyses

We examined class discriminatory power for separating survivors from non-survivors using ROC curves of the gene scores within datasets. The area under the ROC curves (AUROC) was calculated using the trapezoidal method. Summary ROC curves were calculated via the method of Kester and Buntinx⁶⁴. We examined the ability of the models to predict non-survivors using precision–recall curves generated from the gene scores in each examined dataset. Precision–recall curves of the gene scores were constructed within datasets, and the AUPRC)was calculated using the trapezoidal method.

Enrichment analysis

We conducted two analyses to evaluate the functional enrichment of the genes selected as predictors by the four models. This included a targeted enrichment analysis for cell types as previously described⁵⁶ and an exploratory enrichment analysis that assessed a large number of functionally annotated gene sets.

In a mixed tissue such as blood, shifts in gene expression can be caused by changes in cell-type distribution. To check for this effect, we used gene expression profiles derived from known sorted cell types to determine whether a given set of genes is enriched for genes represented in a specific cell type. In each curated cell-type vector, a ‘score’ is calculated by the geometric mean of the upregulated genes minus the geometric mean of the downregulated genes. A higher ‘score’ represents a greater presence of the given cell type in the differential gene expression signature.

For exploratory enrichment, we curated thousands of gene sets from three widely used databases: gene ontology (GO)⁶⁵, the Reactome database of pathways and reactions in human biology⁶⁶, and the Kyoto Encyclopedia of Genes and Genomes (KEGG)⁶⁷. Our 12 discovery cohorts had approximately 6000 genes in common, which formed a ‘background’ set of genes. Genes that are present in the GO/Reactome/KEGG sets but not in the background sets were removed prior to enrichment. We then retained all GO/Reactome/KEGG gene sets containing at least 10% and at least three genes overlapping with the predictor genes. The remaining GO/Reactome/KEGG gene sets were removed to reduce the multiple testing burden. Exploratory enrichment in each of the curated reference gene sets was performed using two different methodologies: gene-based Fisher’s exact test (FET), and, using discovery datasets, expression-based gene set enrichment analysis (GSEA) using GSVA package from bioconductor⁶⁸. Significantly enriched reference gene sets were discovered after adjusting the nominal P-values using the Benjamini–Hochberg method.

Statistics, normalized data and code availability

All computation and calculations were carried out in the R language for statistical computing (version 3.2.0) and Matlab R 2016a (The MathWorks, Inc.). Significance levels for P-values were set at 0.05 and analyses were two-tailed. Analysis source code, final sample scores for the four models along with other relevant analysis results are made available through Synapse, an open source collaborative research platform⁶⁹.

Data availability

All the raw and normalized gene expression data, mortality and/or clinical outcomes data, results are made available through Synapse⁶⁹. Readers may access these data for independent research provided they (i) register onto Synapse and (ii) agree to properly acknowledge both the data contributor(s) and the synapse portal as described on the Data Use Requirements page⁶⁹.

References

Singer, M. et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA 315, 801 (2016).
Article CAS PubMed PubMed Central Google Scholar
Torio, C. M. (ahrq) & Andrews, R. M. (ahrq). National Inpatient Hospital Costs: The Most Expensive Conditions by Payer, 2011. HCUP Statistical Brief #160. (2013).
Liu, V. et al. Hospital deaths in patients with sepsis from 2 independent cohorts. JAMA, https://doi.org/10.1001/jama.2014.5804 (2014).
Kaukonen, K. M., Bailey, M., Pilcher, D., Cooper, D. J. & Bellomo, R. Systemic inflammatory response syndrome criteria in defining severe sepsis. N. Engl. J. Med. 372, 1629–1638 (2015).
Article CAS PubMed Google Scholar
Opal, S. M., Dellinger, R. P., Vincent, J. L., Masur, H. & Angus, D. C. The next generation of sepsis clinical trial designs: what is next after the demise of recombinant human activated protein C?*. Crit. Care. Med. 42, 1714–1721 (2014).
Article CAS PubMed PubMed Central Google Scholar
Cohen, J. et al. Sepsis: a roadmap for future research. Lancet Infect. Dis. 15, 581–614 (2015).
Article PubMed Google Scholar
Shankar-Hari, M. et al. Developing a new definition and assessing new clinical criteria for septic shock: for the third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 315, 775–787 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kutz, A. et al. The TRIAGE-ProADM score for an early risk stratification of medical patients in the emergency department—development based on a multi-national, prospective, observational study. PLoS ONE 11, e0168076 (2016).
Article PubMed PubMed Central Google Scholar
Rast, A., Mueller, B. & Schuetz, P. Clinical scores and blood biomarkers for early risk assessment of patients presenting to the emergency department. OA Emerg. Med. 1, 2 (2014).
Google Scholar
Abraham, E. New definitions for sepsis and septic shock: continuing evolution but with much still to be done. JAMA 315, 757–759 (2016).
Article CAS PubMed Google Scholar
Bermejo-Martin, J. F., Tamayo, E., Andaluz-Ojeda, D., Fernández, M. M. & Almansa, R. Characterising Systemic Immune Dysfunction Syndrome (SIDS) to fill in the gaps of SEPSIS-2 and SEPSIS-3 definitions. CHEST 151, 518–519 (2017).
Article PubMed Google Scholar
Sweeney, T. E. & Wong, H. R. Risk stratification and prognosis in sepsis: what have we learned from microarrays? Clin. Chest. Med. 37, 209–218 (2016).
Article PubMed PubMed Central Google Scholar
Almansa, R. et al. Transcriptomic correlates of organ failure extent in sepsis. J. Infect. 70, 445–456 (2015).
Article PubMed Google Scholar
Wong, H. R. et al. Developing a clinically feasible personalized medicine approach to pediatric septic shock. Am. J. Respir. Crit. Care. Med. 191, 309–315 (2015).
Article PubMed PubMed Central Google Scholar
Davenport, E. E. et al. Genomic landscape of the individual host response and outcomes in sepsis: a prospective cohort study. Lancet Respir. Med. https://doi.org/10.1016/S2213-2600(16)00046-1 (2016).
PubMed PubMed Central Google Scholar
Parnell, G. et al. Aberrant cell cycle and apoptotic changes characterise severe influenza A infection—a meta-analysis of genomic signatures in circulating leukocytes. PLoS ONE 6, e17186 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Parnell, G. P. et al. Identifying key regulatory genes in the whole blood of septic patients to monitor underlying immune dysfunctions. Shock 40, 166–174 (2013).
Article CAS PubMed Google Scholar
Wong, H. R. et al. Genome-level expression profiles in pediatric septic shock indicate a role for altered zinc homeostasis in poor outcome. Physiol. Genom. 30, 146–155 (2007).
Article CAS Google Scholar
Tsalik, E. L. et al. An integrated transcriptome and expressed variant analysis of sepsis survival and death. Genome Med. 6, 111 (2014).
Article PubMed PubMed Central Google Scholar
Bolignano, D. et al. Prognostic models in the clinical arena. Aging Clin. Exp. Res. 24, 300–304 (2012).
Article PubMed Google Scholar
Sieberts, S. K. et al. Crowdsourced assessment of common genetic contribution to predicting anti-TNF treatment response in rheumatoid arthritis. Nat. Commun. 7, 12460 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Allen, G. I. et al. Crowdsourced estimation of cognitive decline and resilience in Alzheimer’s disease. Alzheimers Dement. 12, 645–653 (2016).
Article PubMed PubMed Central Google Scholar
Guinney, J. et al. Prediction of overall survival for patients with metastatic castration-resistant prostate cancer: development of a prognostic model through a crowdsourced challenge with open clinical trial data. Lancet Oncol. https://doi.org/10.1016/S1470-2045(16)30560-5 (2016).
Noren, D. P. et al. A crowdsourcing approach to developing and assessing prediction algorithms for AML prognosis. PLoS Comput. Biol. 12, e1004890 (2016).
Article PubMed PubMed Central Google Scholar
Saez-Rodriguez, J. et al. Crowdsourcing biomedical research: leveraging communities as innovation engines. Nat. Rev. Genet. 17, 470–486 (2016).
Article CAS PubMed Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
Article PubMed PubMed Central Google Scholar
Wright, M. N. & Ziegler, A. ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. arXiv [stat.ML] (2015).
Sweeney, T. E., Perumal, T. M. & Henao, R. Federated Sepsis Analysis: Sample Scores. https://doi.org/10.7303/syn6175515 (2017).
Pachot, A. et al. Systemic transcriptional analysis in survivor and non-survivor septic shock patients: a preliminary study. Immunol. Lett. 106, 63–71 (2006).
Article CAS PubMed Google Scholar
Bauer, P. R. et al. Diagnostic accuracy and clinical relevance of an inflammatory biomarker panel for sepsis in adult critically ill patients. Diagn. Microbiol. Infect. Dis. 84, 175–180 (2016).
Article CAS PubMed Google Scholar
Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10, e0118432 (2015).
Article PubMed PubMed Central Google Scholar
Mathias, B. et al. Human myeloid-derived suppressor cells are associated with chronic immune suppression after severe sepsis/septic shock. Ann. Surg. https://doi.org/10.1097/SLA.0000000000001783 (2016).
Pham, C. T. N. Neutrophil serine proteases: specific regulators of inflammation. Nat. Rev. Immunol. 6, 541–550 (2006).
Article CAS PubMed Google Scholar
Manfredi, A. A., Covino, C., Rovere-Querini, P. & Maugeri, N. Instructive influences of phagocytic clearance of dying cells on neutrophil extracellular trap generation. Clin. Exp. Immunol. 179, 24–29 (2015).
Article CAS PubMed Google Scholar
Masuda, S. et al. NETosis markers: quest for specific, objective, and quantitative markers. Clin. Chim. Acta 459, 89–93 (2016).
Article CAS PubMed Google Scholar
Nalos, M. et al. Transcriptional reprogramming of metabolic pathways in critically ill patients. Intensive Care Med. Exp. 4, 21 (2016).
Article PubMed PubMed Central Google Scholar
Yang, L. et al. PKM2 regulates the Warburg effect and promotes HMGB1 release in sepsis. Nat. Commun. 5, 4436 (2014).
CAS PubMed PubMed Central Google Scholar
Cheng, S.-C. et al. mTOR- and HIF-1α-mediated aerobic glycolysis as metabolic basis for trained immunity. Science 345, 1250684 (2014).
Article PubMed PubMed Central Google Scholar
Liu, T. F. et al. Fueling the flame: bioenergy couples metabolism and inflammation. J. Leukoc. Biol. 92, 499–507 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tibshirani, R. Regression shrinkage and selection via the lasso: a retrospective. J. R. Stat. Soc. Ser. B Stat. Methodol. 73, 273–282 (2011).
Article MathSciNet Google Scholar
Bierer, B. E., Li, R., Barnes, M. & Sim, I. A global, neutral platform for sharing trial data. N. Engl. J. Med. 374, 2411–2413 (2016).
Article PubMed Google Scholar
Sweeney, T. E., Haynes, W. A., Vallania, F., Ioannidis, J. P. & Khatri, P. Methods to increase reproducibility in differential gene expression via meta-analysis. Nucleic Acids Res. 45, e1 (2017).
Article PubMed Google Scholar
Spence, R. P. et al. Validation of virulence and epidemiology DNA microarray for identification and characterization of staphylococcus aureus isolates. J. Clin. Microbiol. 46, 1620–1627 (2008).
Article PubMed PubMed Central Google Scholar
Howrylak, J. A. et al. Discovery of the gene signature for acute lung injury in patients with sepsis. Physiol. Genom. 37, 133–139 (2009).
Article CAS Google Scholar
Pankla, R. et al. Genomic transcriptional profiling identifies a candidate blood biomarker signature for the diagnosis of septicemic melioidosis. Genome Biol. 10, R127 (2009).
Article PubMed PubMed Central Google Scholar
Berdal, J. E. et al. Excessive innate immune response and mutant D222G/N in severe A (H1N1) pandemic influenza. J. Infect. 63, 308–316 (2011).
Article PubMed Google Scholar
Dolinay, T. et al. Inflammasome-regulated cytokines are critical mediators of acute lung injury. Am. J. Respir. Crit. Care. Med. 185, 1225–1234 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lill, M. et al. Peripheral blood RNA gene expression profiling in patients with bacterial meningitis. Front. Neurosci. 7, 33 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kangelaris, K. N. et al. Increased expression of neutrophil-related genes in patients with early sepsis-induced ARDS. Am. J. Physiol. Lung Cell. Mol. Physiol. 308, L1102–L1113 (2015).
Article PubMed PubMed Central Google Scholar
Bermejo-Martin, J. F. et al. Host adaptive immunity deficiency in severe pandemic influenza. Crit. Care 14, R167 (2010).
Article PubMed PubMed Central Google Scholar
Ahn, S. H. et al. Gene expression-based classifiers identify Staphylococcus aureus infection in mice and humans. PLoS ONE 8, e48979 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Tsalik, E. L. et al. Host gene expression classifiers diagnose acute respiratory illness etiology. Sci. Transl. Med. 8, 322ra11 (2016).
Article PubMed PubMed Central Google Scholar
Irwin, A. D. et al. Novel biomarker combination improves the diagnosis of serious bacterial infections in Malawian children. BMC Med. Genom. 5, 13 (2012).
Article CAS Google Scholar
Kwan, A., Hubank, M., Rashid, A., Klein, N. & Peters, M. J. Transcriptional instability during evolving sepsis may limit biomarker based risk stratification. PLoS ONE 8, e60501 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Raman, S. et al. Oxidative phosphorylation gene expression falls at onset and throughout the development of meningococcal sepsis-induced multi-organ failure in children. Intensive Care Med. 41, 1489–1490 (2015).
Article PubMed PubMed Central Google Scholar
Sweeney, T. E., Shidham, A., Wong, H. R. & Khatri, P. A comprehensive time-course-based multicohort analysis of sepsis and sterile inflammation reveals a robust diagnostic gene set. Sci. Transl. Med. 7, 287ra71 (2015).
Article PubMed PubMed Central Google Scholar
Almansa, R. et al. Transcriptomic evidence of impaired immunoglobulin G production in fatal septic shock. J. Crit. Care 29, 307–309 (2014).
Article CAS PubMed Google Scholar
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2006).
Article PubMed MATH Google Scholar
Seok, J. et al. Genomic responses in mouse models poorly mimic human inflammatory diseases. Proc. Natl. Acad. Sci. USA 110, 3507–3512 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Gautier, L., Cope, L., Bolstad, B. M. & IrizarryR. A.. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20, 307–315 (2004).
Article CAS PubMed Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47–e47 (2015).
Article PubMed PubMed Central Google Scholar
Law, C. W., Yunshun, C., Wei, S. & Smyth, G. K. Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).
Article PubMed PubMed Central Google Scholar
Pencina, M. J., D’Agostino, R. B. Sr & Steyerberg, E. W. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat. Med. 30, 11–21 (2011).
Article MathSciNet PubMed Google Scholar
Sweeney, T. E., Braviak, L., Tato, C. M. & Khatri, P. Genome-wide expression for diagnosis of pulmonary tuberculosis: a multicohort analysis. Lancet Respir. Med 4, 213–224 (2016).
Article CAS PubMed PubMed Central Google Scholar
The Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 43, D1049–D1056 (2014).
Article PubMed Central Google Scholar
Fabregat, A. et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 44, D481–D487 (2016).
Article CAS PubMed Google Scholar
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017).
Article CAS PubMed Google Scholar
Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7 (2013).
Article PubMed PubMed Central Google Scholar
Sweeney, T. E., Perumal, T. M. & Henao, R. A community approach to mortality prediction in sepsis via gene expression analysis: Analysis Repository. https://doi.org/10.7303/syn5612563 (2017).

Download references

Acknowledgements

We would like to thank the authors who contributed gene expression data to the public domain that we re-analyzed here. We thank Dr. Michael Bauer for helpful discussion of his group’s sepsis data. We thank the Glue Grant investigators for sharing their data publically; they are supported in this by NIGMS Glue Grant Legacy Award R24GM102656. J.F.B.-M., R.A., and E.T. were supported by Instituto de Salud Carlos III (grants EMER07/050, PI13/02110, PI16/01156). R.J.L. was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under award number UL1TR001417. The CAPSOD study was supported by NIH (U01AI066569, P20RR016480, HHSN266200400064C). P.K. is supported by grants from Bill Melinda Gates Foundation, R01 AI125197-01, 1U19AI109662, and U19AI057229, outside the submitted work. The GAinS study was supported by the National Institute for Health Research through the Comprehensive Clinical Research Network for patient recruitment; Wellcome Trust (Grants 074318 [to J.C.K.], and 090532/Z/09/Z [core facilities Wellcome Trust Centre for Human Genetics including High-Throughput Genomics Group]); European Research Council under the European Union’s Seventh Framework Programme (FP7/2007–2013)/ERC Grant agreement no. 281824 (to J.C.K.), the Medical Research Council (98082 [to J.C.K.]); UK Intensive Care Society; and NIHR Oxford Biomedical Research Centre. The Duke HAI study was supported by a research agreement between Duke University and Novartis Vaccines and Diagnostics, Inc. According to the terms of the agreement, representatives of the sponsor had an opportunity to review and comment on a draft of the manuscript. The authors had full control of the analyses, the preparation of the manuscript, and the decision to submit the manuscript for publication. For the University of Florida ‘P50’ Study, data were obtained from the Sepsis and Critically Illness Research Center (SCIRC) at the University of Florida College of Medicine, which is supported in part by NIGMS P50 GM111152. This work was supported by Defense Advanced Research Projects Agency and the Army Research Office through Grant W911NF-15-1-0107. The views expressed are those of the author and do not reflect the official policy or position of the Department of Veterans Affairs, the Department of Defense or the U.S. Government.

Author information

Timothy E. Sweeney
Present address: Inflammatix Inc., Burlingame, CA, 94010, USA
Timothy E. Sweeney and Thanneer M. Perumal contributed equally to this work.
Purvesh Khatri, Ephraim L. Tsalik, Lara M. Mangravite and Raymond J. Langley jointly supervised this work.

Authors and Affiliations

Stanford Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA, 94305, USA
Timothy E. Sweeney & Purvesh Khatri
Division of Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA
Timothy E. Sweeney & Purvesh Khatri
Sage Bionetworks, Seattle, WA, 98109, USA
Thanneer M. Perumal, Larsson Omberg & Lara M. Mangravite
Center for Applied Genomics and Precision Medicine, Department of Medicine, Duke University, Durham, NC, 27708, USA
Ricardo Henao, Marshall Nichols, Christopher W. Woods, Geoffrey S. Ginsburg & Ephraim L. Tsalik
Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, USA
Ricardo Henao
Division of Pulmonary and Critical Care Medicine, Penn State Milton S. Hershey Medical Center, Hershey, PA, 17033, USA
Judith A. Howrylak
Department of Medicine, Cornell Medical Center, New York, NY, 10065, USA
Augustine M. Choi
Hospital Clínico Universitario de Valladolid/IECSCYL, Valladolid, 47005, Spain
Jesús F. Bermejo-Martin, Raquel Almansa & Eduardo Tamayo
Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02115, USA
Emma E. Davenport
Partners Center for Personalized Genetic Medicine, Boston, MA, 02115, USA
Emma E. Davenport
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
Emma E. Davenport
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
Katie L. Burnham & Julian C. Knight
William Harvey Research Institute, Barts and The London School of Medicine, Queen Mary University, London, EC1M 6BQ, UK
Charles J. Hinds
Division of Infectious Diseases and International Health, Department of Medicine, Duke University, Durham, NC, 27710, USA
Christopher W. Woods & Ephraim L. Tsalik
Durham Veteran’s Affairs Health Care System, Durham, NC, 27705, USA
Christopher W. Woods & Ephraim L. Tsalik
Rady Children’s Institute for Genomic Medicine, San Diego, CA, 92123, USA
Stephen F. Kingsmore
Division of Critical Care Medicine, Cincinnati Children’s Hospital Medical Center and Cincinnati Children’s Research Foundation, Cincinnati, OH, 45223, USA
Hector R. Wong
Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA
Hector R. Wong
Centre for Immunology and Allergy Research, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia
Grant P. Parnell & Benjamin Tang
Department of Intensive Care Medicine, Nepean Hospital, Sydney, Australia, Penrith, NSW, 2751, Australia
Benjamin Tang
Nepean Genomic Research Group, Nepean Clinical School, University of Sydney, Penrith, NSW, 2751, Australia
Benjamin Tang
Marie Bashir Institute for Infectious Diseases and Biosecurity, Westmead, NSW, 2145, Australia
Benjamin Tang
Department of Surgery, University of Florida College of Medicine, Gainesville, FL, 32610, USA
Lyle L. Moldawer & Frederick E. Moore
Department of Pharmacology, University of South Alabama, Mobile, AL, 36688, USA
Raymond J. Langley

Authors

Timothy E. Sweeney
View author publications
You can also search for this author in PubMed Google Scholar
Thanneer M. Perumal
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Henao
View author publications
You can also search for this author in PubMed Google Scholar
Marshall Nichols
View author publications
You can also search for this author in PubMed Google Scholar
Judith A. Howrylak
View author publications
You can also search for this author in PubMed Google Scholar
Augustine M. Choi
View author publications
You can also search for this author in PubMed Google Scholar
Jesús F. Bermejo-Martin
View author publications
You can also search for this author in PubMed Google Scholar
Raquel Almansa
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Tamayo
View author publications
You can also search for this author in PubMed Google Scholar
Emma E. Davenport
View author publications
You can also search for this author in PubMed Google Scholar
Katie L. Burnham
View author publications
You can also search for this author in PubMed Google Scholar
Charles J. Hinds
View author publications
You can also search for this author in PubMed Google Scholar
Julian C. Knight
View author publications
You can also search for this author in PubMed Google Scholar
Christopher W. Woods
View author publications
You can also search for this author in PubMed Google Scholar
Stephen F. Kingsmore
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey S. Ginsburg
View author publications
You can also search for this author in PubMed Google Scholar
Hector R. Wong
View author publications
You can also search for this author in PubMed Google Scholar
Grant P. Parnell
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Tang
View author publications
You can also search for this author in PubMed Google Scholar
Lyle L. Moldawer
View author publications
You can also search for this author in PubMed Google Scholar
Frederick E. Moore
View author publications
You can also search for this author in PubMed Google Scholar
Larsson Omberg
View author publications
You can also search for this author in PubMed Google Scholar
Purvesh Khatri
View author publications
You can also search for this author in PubMed Google Scholar
Ephraim L. Tsalik
View author publications
You can also search for this author in PubMed Google Scholar
Lara M. Mangravite
View author publications
You can also search for this author in PubMed Google Scholar
Raymond J. Langley
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Study conception and design: T.E.S., T.M.P., R.H., L.O., P.K., E.L.T., L.M.M., and R.J.L.; contributed materials: M.N., J.A.H., A.C., J.F.B.M., R.A., E.T., E.E.D., K.L.B., C.J.H., J.C.K., C.W.W., S.F.K., G.S.G., H.R.W., G.P.P., B.T., L.L.M., F.E.M., E.L.T., and R.J.L.; performed the analyses: T.E.S., T.M.P., and R.H.; drafted manuscript: T.E.S. and T.M.P.; critical revision: all authors.

Corresponding author

Correspondence to Raymond J. Langley.

Ethics declarations

Competing interests

The ‘Duke’ 18-gene score is the subject of a provisional patent filed by Duke University. The ‘Stanford’ 12-gene score is the subject of a provisional patent filed by Stanford University. T.E.S. and P.K. are co-founders of Inflammatix, Inc., which has a commercial interest in the ‘Stanford’ 12-gene score. The remaining authors declare no competing financial interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sweeney, T.E., Perumal, T., Henao, R. et al. A community approach to mortality prediction in sepsis via gene expression analysis. Nat Commun 9, 694 (2018). https://doi.org/10.1038/s41467-018-03078-2

Download citation

Received: 12 January 2017
Accepted: 18 January 2018
Published: 15 February 2018
DOI: https://doi.org/10.1038/s41467-018-03078-2

This article is cited by

Identification of a sub-group of critically ill patients with high risk of intensive care unit-acquired infections and poor clinical course using a transcriptomic score
- Maxime Bodinier
- Guillaume Monneret
- Jean-François Llitjos
Critical Care (2023)
A hypoxia- and lactate metabolism-related gene signature to predict prognosis of sepsis: discovery and validation in independent cohorts
- Yaojun Peng
- Qiyan Wu
- Haiyan Zhu
European Journal of Medical Research (2023)
The Role of Transcriptomics in Redefining Critical Illness
- Tiana M. Pelaia
- Maryam Shojaei
- Anthony S. McLean
Critical Care (2023)
A machine learning classifier using 33 host immune response mRNAs accurately distinguishes viral and non-viral acute respiratory illnesses in nasal swab samples
- Rushika Pandya
- Yudong D. He
- Purvesh Khatri
Genome Medicine (2023)
Co-expression module analysis reveals high expression homogeneity for both coding and non-coding genes in sepsis
- Xiaojun Liu
- Chengying Hong
- Lixin Cheng
BMC Genomics (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Analysis overview

Prognostic power assessments

Comparison to standard predictors

Comparison across models

Biology of the gene signatures of mortality

Discussion

Methods

Systematic search

Gene expression normalization

Prediction models

Comparison with severity scores

Discriminatory power analyses

Enrichment analysis

Statistics, normalized data and code availability

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links