Sepsis is defined in the clinic as an assemblage of various failing physiology and laboratory markers of organ function triggered by infection. Efforts have been made to provide tighter definitions and criteria for sepsis to achieve better and more focused clinical care, research and epidemiology1. Many non-infectious conditions mimic sepsis, representing a considerable clinical challenge2. Accordingly, empiric broad-spectrum antibiotic therapy is deployed, even in cases with no true bacterial infection, or even any infection, with consequences for global antibiotic resistance rates. For example, patients with COVID-19-associated sepsis were often treated unnecessarily with broad-spectrum antibacterial agents ‘just in case’. In other cases, sepsis is diagnosed late with potentially adverse outcomes3. Thus, clinical implementation of accurate and rapid diagnostics for sepsis at both pathogen and host levels remains challenging and urgently needed4. Now, writing in Nature Microbiology, Kalantar and colleagues5 move one step closer to this goal and develop a sepsis diagnostic tool integrating information from both host and pathogen.

The authors used host and pathogen metagenomic next generation sequencing (mNGS) of both whole blood and plasma nucleic acids sampled from patients from two US hospitals who were directly admitted to the intensive care unit from their emergency departments between 2010 and 2018. Patients were triaged according to whether they had ‘sepsis’ with either a concurrent bloodstream infection or infection identified elsewhere; ‘suspected sepsis’ with negative microbiological culture; critical illness for reasons other than infection; and an ‘indeterminant status’ reflective of clinician diagnostic uncertainty. Notably, 73 out of 92 patients adjudged to not have infection were nevertheless given antibiotics. The authors identified several distinct patterns of host response that distinguished, with decent accuracy, infectious from non-infectious conditions, as well as viral from bacterial causes of sepsis. Similarly encouraging data have been reported by other groups in different settings6,7,8. The main advance of Kalantar and colleagues’ work is that it combined host and pathogen data from plasma nucleic acid into an integrated model that considerably improved diagnostic sensitivity to 97–100% and has potential use as a rule-out test. In spite of the promising accuracy of the model to identify an infectious condition, its specificity for ‘no-infection’ was 78%, meaning that nearly one-in-four patients who did not have infection would be misdiagnosed. This might be due to the possibility that some of the causative microorganisms (or their nucleic acid) transiently appear in the bloodstream9, or due to unrecognized secondary infections contributing to the severity of illness10.

The authors highlight several limitations associated with their work, including an up to 24-hour delay between blood cultures and the sample taken for mNGS, which was almost always after antibiotic administration impacting microorganism retrieval. While the challenge of ensuring sufficient plasma volume to obtain adequate RNA mass seems easily solved, the additional challenge of ascribing clinical significance to detected bacterial sequences could be mitigated by combined use of host-based data. The need for external validation is also rightly emphasized. Much larger patient populations will need to be recruited to ensure not only accuracy but also generalizability across ethnicities, ages, countries, immunosuppression and other potential confounding factors11. Finally, would a system trained in two US hospitals perform equally well elsewhere? Resolving these questions will be needed for registration requirements (Fig. 1).

Fig. 1: Overview of the combined host and pathogen metagenomics diagnostic pipeline.
figure 1

Kalantar and colleagues performed mNGS of plasma nucleic acid from a cohort of critically ill patients to analyse both host and pathogen features. The authors apply machine learning to their metagenomics data and developed an integrated host–pathogen diagnostic model that can distinguish sepsis from non-sepsis, viral infections from fungal or bacterial infections, and patients who are ‘inflamed’ but likely not infected.

Although the work by Kalantar and colleagues is a valuable contribution to diagnostics research, further challenges arise from a clinical perspective. Will enough differentiation be found within the host signature to unravel systemic inflammatory conditions with clinical phenotypes more closely mimicking sepsis, such as major surgery, trauma and pancreatitis? Would the presented approach yield (near-) comparable data in patients who are pre-symptomatic or patients with early organ dysfunction, allowing pre-emptive intervention? Would there be added value from sequential monitoring? From a clinician’s standpoint, is rapid knowledge of the aetiologic pathogen(s) critical for optimal treatment of sepsis, as claimed? Arguably, a swift broader speciation (Gram-positive, Gram-negative, fungal, viral) with anti-infective drug susceptibilities may suffice for directing the most important drug interventions for better outcomes.

The authors have nicely demonstrated how an integrated analysis of pathogen and host response improves diagnostic accuracy. Further benefit might be achieved from combining clinical, biochemical, molecular, imaging and/or various -omic profiles into the mix12. Applying machine learning algorithms to such increasingly complex datasets would be relatively easy because Cloud Computing capabilities are accessible, powerful and cheap. The ultimate goal would be to rapidly direct effective clinical interventions for better outcomes, simultaneously targeting pathogen elimination and ‘personalized’ therapy to the highly heterogeneous host response to sepsis. It is important to keep in mind that to truly impact patient outcomes, such interventions must translate into real-life actions for pragmatic solutions.

This study provides further encouragement and direction for leveraging large clinical and laboratory datasets with computing power and algorithms. However, some 26 years after computers could predict intensive care unit survival13 with no tangible clinical uptake, it remains no easy task to combine host and pathogen diagnostics at pace and scale with built-in around-the-clock capability and user-friendly solutions, at affordable cost and in a clinically useful timeframe of 6 hours or less. The challenge to develop a road map for the use of genome-based diagnostics beyond academic proof-of-concept for better patient treatment should now be the priority.