Introduction

Infectious diseases are a leading cause of morbidity and death worldwide. However, early detection of pathogens may be challenging in many clinical scenarios. Moreover, many common pathogens are difficult or impossible to detect using conventional microbiological tests (e.g. culture, smears, immunological tests and polymerase chain reaction (PCR) assays), which makes precise diagnosis challenging. Culture methods are time-consuming and have strict limitations. Smears, immunological tests and multiplex PCR assays will only test for a specific pathogen that must be identified by the clinicians before the test is performed1. The administration of broad-spectrum antibiotics in the absence of pathogen identification, despite comprehensive testing methods, frequently confounds specific diagnoses, which could lead to more toxic and less effective antimicrobial therapy2.

Metagenomic next-generation sequencing (mNGS) is a high-throughput method that can directly detect pathogens (i.e., bacterial species) in clinical specimens and analyze functional genes without the need to pre-select target sequences3. It is especially suitable for novel, rare, and atypical etiologies of complicated infectious diseases. Due to characteristics of speed, sensitivity, culture-independent, hypothesis-free, and unbiased pathogen detection, mNGS may become a routine diagnostic tool, partly replacing more traditional detection methods4. Some investigators have even decided to upgrade their model, known as ‘Microbial Index of Pathogenic Bacteria’, by implementing whole metagenome sequencing data for species and strain- level identification of patho-genic bacteria5. To date, mNGS has been applied in the diagnosis of pathogens in bloodstream infections6,7, respiratory tract infections8,9, tuberculosis10, meningitis and encephalitis11,12. However, these studies were limited by small sample sizes. As such, we aimed to perform a systematic accuracy review of diagnostic tests and a meta-analysis to identify, quality appraise, and synthesize the available evidence to inform the implementation of mNGS in diagnosing infectious diseases.

Methods

Literature search

Results of the present systematic review and meta-analysis are reported in accordance with the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy studies (PRISMA-DTA)13. A comprehensive electronic literature search of Embase, PubMed and Scopus databases was performed for relevant studies published up to December 31, 2021. The medical subject heading (i.e., ‘MeSH’) search terms included 'infection' and 'Metagenomic Next-Generation Sequencing'. The reference lists of retrieved studies were also manually searched for additional, possibly eligible studies. Three reviewers independently screened the titles, and abstracts and obtained the full-text of potentially relevant studies; any disagreements were resolved by consensus discussion.

Inclusion and exclusion criteria

Cross-sectional and cohort studies including patients with clinically suspicious infection (including meningitis, bacteremia, fungemia, osteomyelitis, septic arthritis) for whom diagnostic test accuracy data for mNGS were included. Only English language articles were eligible. No restrictions were imposed on the age of the study population.

Studies reporting insufficient data to construct a 2 × 2 table (true positive, false positive, true negative, and false negative), those based on non-human samples, investigations reporting duplicate information already reported in other publications; those not reporting the reference infection diagnostic criteria; or reporting one specific pathogene and abstracts, conference presentations, case reports and letters were excluded.

Data extraction

Data were independently extracted by two reviewers using a standardized protocol and prespecified data extraction forms for diagnostic test accuracy studies14. Disagreements were resolved by a third investigator. Information regarding study characteristics (including population, period, design, country, and sample size) was extracted.

Quality assessment

The quality of the included studies was independently assessed by two reviewers, using the revised Quality Assessment of Diagnostic Accuracy Studies-2 tool15.

Statistical analysis

For each study, pooled specificity, pooled sensitivity, pooled negative predictive value (NPV), and pooled positive predictive value (PPV) were calculated based on a bivariate meta-analysis model16. They are presented as graphical representations in which the boxes mark the values and the horizontal lines represent the confidence intervals (CIs). A summary receiver operating characteristic curve (sROC) was drawn, and the area under the curve (AUC) was calculated to determine the performance of a diagnostic test17. The criteria for AUC classification were as follows: 0.50 (failure), 0.60–0.70 (poor), 0.70–0.80 (fair), 0.80–0.90 (good) and 0.90–1 (excellent). The Q* index and corresponding standard error (SE), is an additional measure which is the point on the sROC curve closest to the ideal left top-left corner (where summary sensitivities (SN) and summary specificities (SP) meet).

Heterogeneity was evaluated by calculating the I218 statistic. DerSimonian and Laird random effects models19, which include both between and within study heterogeneity, were used to generate summary SP, SN, negative likelihood ratios (− LR), positive likelihood ratios (+ LR) and diagnostic odds ratio (DOR). Heterogeneity was also assessed using forest plots of sensitivity and specificity across studies for variability of study estimates in the hierarchical sROC model (meta-regression). A Cochrane’s-Q p < 0.10 and I2 > 50% indicated significant heterogeneity, of SN and SP and LRs, respectively. Furthermore, the risk of bias in the included studies was assessed by using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS–2) tool15. Publication bias was assessed by using a funnel plot and Deeks test20. Statistical analysis was performed using MetaDisc version 1.4, Stata version 12.0 (StataCorp LLC, College Station, TX, USA), and Review Manager 5 (version 5.3) (R Foundation for Statistical Computing, Vienna, Austria)21.

Results

Characteristics of the included studies

After removing duplicate publications and references checking for additional, potentially eligible studies, a total of 891 studies were screened. Of these, 77 separate publications underwent full text review, resulting in 20 studies included in this systematic review. The study selection process is illustrated in the PRISMA flow diagram (Fig. 1). Six studies were performed in high-income countries, whereas 14 were conducted in low and middle-income countries. The study included 2716 participants. Twelve of the studies were retrospective, and eight were prospective in design. Among the enrolled studies, participants were predominantly adults. The included studies were published between 2017 and 2021. (Table 1).

Figure 1
figure 1

Flow diagram of study selection process.

Table 1 Characteristics of Included Studies.

Risk of bias

The risk of bias and applicability concerns according to the QUADAS-2 tool are shown in Fig. 2. All studies demonstrated unclear or low risks of bias.

Figure 2
figure 2

Risk of bias and applicability concerns summary.

Meta‑analysis

Heterogeneity test

No correlation was found between sensitivity logarithm and 1-specificity logarithm (Spearman correlation 0.0345 (p = 0.092)). Analysis revealed no threshold effect among the included studies. As is common with meta-analyses investigating results of diagnostic accuracy research, remarkable heterogeneity was present, with sensitivity and specificity estimates varying widely. The Cochran-Q for the pooled DOR was 207.82, (p = 0.00, I2 = 88.5%). This suggests that a non-threshold effect was the cause of heterogeneity and a random effect model was used in further analysis.

Random effect model analysis results

The reported diagnostic sensitivity of the mNGS in infectious diseases ranged between 21% and 100% (Fig. 3a), and the reported specificity ranged from 14% to 100% (Fig. 3b). The pooled summary sensitivity reached 75% (95% CI 72–77%, I2 = 93.3%) (Fig. 3a) and pooled summary specificity was computed to 68% (95% CI: 66%–70%, I2 = 97.4%) (Fig. 3b), indicating significant heterogeneity. The pooled positive LR was 2.8 (95% CI: 2.1–3.77) and the pooled negative LR was 0.32 (95% CI: 0.23–0.46) (Fig. 4a,b).

Figure 3
figure 3

Forest plot of estimates results: (a) Sensitivity; (b) Specificity.

Figure 4
figure 4

(a) Positive likelihood ratio; (b) Negative likelihood ratio.

Subgroup analysis

Subgroup analysis was also performed to explore the influence of different reference standards in the final result (Supplementary Fig. 1a–d). Two subgroups were formed based on two reference standards: conventional testing and clinical diagnosis. The results confirmed consistent performance.

Heterogeneity analysis

Four components, including “gold standard”, “experimental design”, “age” and “country income” were considered in the meta-regression analysis to explore potential risk of bias. Unfortunately, none of these components exhibited heterogeneity. Due to failure to extract more comprehensive data from the research, it was not further analyzed.

Evaluation of diagnostic accuracy

SROC curves for the mNGS in infectious diseases are presented in Fig. 5a. This figure illustrates the relationship between sensitivities and 1-specificity for the included studies in the pooled analyses. The AUC was considered excellent (AUC = 0.85 (SE = 0.03)). The point at which sensitivity and specificity were equal (Q*) was 0.78 (SE = 0.03). The pooled DOR was 11.94 (95% CI: 6.11–23.34) (Fig. 5b).

Figure 5
figure 5

Summary ROC curves.

Publication bias

Deek’s test yielded no evidence of publication bias (P = 0.795). (Supplementary Fig. 2).

Discussion

To our knowledge, the present meta-analysis was the first to systematically review the use of mNGS in diagnosing infectious diseases. Conventional techniques for the detection of pathogens are largely target-dependent tests, which detect a limited number of micro-organisms. However, NGS-based metagenome approaches are target independent and can detect unknown pathogens22. Using the pooled estimate of 75% (95% CI: 72–77%, I2 = 93.3%)) at median specificity 68% (95% CI: 66–70%, I2 = 97.4). The AUC 85%, which reflected infection using mNGS, was classified as excellent performance.

The DOR reflects the relationship between the diagnostic test and the relevant disease. The pooled DOR was 11.94, reflecting diagnostic efficacy of mNGS in infectious diseases. The pooled positive LR was 2.81 (95% CI: 2.1–3.77), which reflects that the risk of developing the disease was 2.81 times that of not having the disease when the results of next generation sequencing being positive. The pooled negative LR was 0.32 (95% CI: 0.23–0.46), which reflects that the risk of developing the disease was 0.32 times that of not having the disease when the results of NGS are negative. The sROC curve reflects merge indicators of the sensitivity and specificity. The AUC for sROC was 0.85, which reflected high diagnostic efficiency.

Some studies23,24,25,26 demonstrated that mNGS had diagnostic advantages over conventional methods for patients treated with empirical antibiotics before sample collection. The use of empirical antibiotics would significantly lower the detection rate of conventional methods by approximately 20%, while mNGS is not affected23. The reason may likely be due to the fact that culture methods require the existence of live pathogens and, therefore, are easily influenced by the administration of antimicrobials. On the other hand, high-throughput sequencing needs only to identify DNA fragments of microorganisms, which may explain its relatively higher detection rate after antimicrobial treatment. Moreover, it can shorten turnaround time and detect pathogens without bias27.

NGS also has shortcomings. First, it is not sensitive for intracellular bacteria and fungi in difficulty obtaining circulatory genome DNA23,28. RNA viruses require reverse transcription before deep sequencing and the amount of DNA segments may be reduced23. Different NGS technique may introduce bias (Supplementary Table 1). Second, mNGS is relatively expensive. Third, the criteria for diagnosing single pathogens are unclear, and are mainly based on the relative abundance of pathogens, the coverage rate or unique reads of pathogens8,29. In addition, given the untargeted nature of mNGS, background interference is a fairly common limitation.

Our study also had limitations. The first of which was considerable heterogeneity, the sources of which were extensively explored. Meta-regression results revealed that “experimental design” and “age” may have been the cause of heterogeneity. Another factor that needs to be considered is the clinical heterogeneity exhibited in the included studies such as the number of patients, antibiotic treatment, sampling methods, different reference standards and other unknown factors such as technical variations (e.g. sequencing strategies and platforms), sequence profiling software, prediction models, and batch effects. Second, the number of patients in two studies30,31 was relatively small, which may have reduced our statistical power. Third, no fourfold contingency tables were feasible for most of the studies because some of the necessary data were calculated based on reported sensitivity and specificity. Fourth, limiting the search strategy to English language publications could have potentially missed some studies. Finally, the included studies may have potentially been affected by selection bias and the use of different reference standards for infectious diseases.

Conclusions

mNGS combined with conventional microbiological testing can improve diagnostic efficiency. We believe that mNGS may be a potential step forward in diagnosing infectious diseases due to its non-invasive, rapid and untargeted characteristics.