Tuberculosis (TB), an aerosol-borne disease caused by Mycobacterium tuberculosis (Mtb), is one of the top 10 causes of death worldwide and the leading cause of death from a single infectious agent1. About a quarter of the world’s population is estimated to have latent TB infection (LTBI), and about 10% of these individuals will progress to have active TB disease during their lifetime (reactivation TB)1,2. Despite longstanding intense efforts to control this disease, TB remains a global health problem that mandates better diagnostic tests and preventive strategies.

Worldwide, the diagnosis of active TB is mostly dependent on sputum smear microscopy by Mtb acid-fast staining or culture3. Microscopy suffers from low sensitivity, and culture can take several weeks to yield results, and neither can be applied to extrapulmonary TB4. Although the GeneXpert MTB/RIF test offers a fast result for active TB, the test can be a challenge for TB diagnosis of children and the elderly due to difficulties in obtaining sputum samples from these groups5,6. Also, the GeneXpert MTB/RIF requires sophisticated technology and a well-trained staff, and thus not affordable or sustainable in most healthcare systems7. Finally, none of these sputum-based tests can predict reactivation TB.

Multiple populations of immune cells have distinct functions that cooperate for Mtb infection control when the bacillus enters the lungs8. During infection, alterations of immune processes in the host lead to changes in the transcriptional profiles of circulating immune cells9. The immune response-based biomarker identification for TB diagnosis has extensively been researched10 and many of them have focused on distinguishing latent infection from active TB11,12,13. Unfortunately, none of these gene signatures has so far been translated into a point of care (POC) diagnostic test. The translation into the clinical practice of gene signature-based assays is challenged by the difficulty in determining which of the multiple gene signatures can be implemented as a diagnostic platform that is simple and cost-effective.

Here, we report the results of an immune-based gene expression profile study based on the NanoString technology in patients with active TB and other pulmonary diseases (OPD), healthy donors with latent TB infection (LTBI), and uninfected health controls (HC). The aim of this study was to identify whole blood markers that can distinguish active TB from OPD, HC, and LTBI. We identified 23 and seven genes associated with inflammatory mechanisms that distinguished with high sensitivity and specificity, patients with TB from OPD and LTBI, respectively.


Demographic and clinical characteristics of the study population

The demographic, clinical, and laboratory features of the 35 study participants are shown in Table 1. Of the 17 TB patients, 13 (76.5%) had sputum smear test positive, three were positive by Mtb culture and one patient had the TB confirmed by Mtb molecular test (XPERT TB/RIF). Of all TB patients, eight (47.1%) were screened by Mtb culture. The median age was 41.9 (± 14.04) years in the TB group, 42.7 (± 17.06) in the LTBI group, 43.8 (± 9.70) in the OPD group, and 32.5 (± 3.53) in the HC group.

Table 1 Demographic and clinical data of study population.

Sample clustering

We evaluated 594 inflammatory genes in whole blood from 17 TB patients and 18 controls (seven with LTBI, six HC and five with OPD). We further organized these groups in order to identify whole blood biomarkers to diagnose active TB (TB vs. OPD) and candidate to predict TB reactivation (LTBI vs. TB). First, we evaluated all four study groups together to verify whether the gene panel would be able to distinguish them. Figure 1 shows a heatmap of the normalized data generated via unsupervised hierarchical clustering. The mRNA expression levels of 46 of 594 genes segregated the study groups into two large groups. Transcripts that showed increased expression (red) clustered among TB patients while those that showed decreased expression clustered among non-active TB groups. Two individuals belonged to the groups LTBI (LTBI1) and HC (HC3) clustered with patients with active TB.

Figure 1
figure 1

Heatmap showing different expression pattern of 46 proinflammatory genes out of 549 genes. Heatmap of gene expression levels in patients diagnosed with active TB (red), LTBI subjects (green), OPD patients (yellow) and uninfected donors HC (blue). Expression levels are scaled from dark blue (low expression) to dark red (high expression). Heatmap was generated in R (version 3.6.3) with the ComplexHeatmap package (version 2.0.0,

Gene expression data of TB and OPD donors

Asthma represents a chronic non-infectious inflammatory airways disease and needs to be promptly distinguished from TB by healthcare providers. We identified 23 candidates genes that differentiated most of the TB patients from asthma (OPD group) (p < 0.001 and fold change [FC] > 2) (Fig. 2A). Principal component analyses (PCA) of the gene expression data showed significant separation between TB and OPD patients (Fig. 2B). The findings are also presented by the volcano plots of all data displayed in orange at a significance level of p < 0.05 and at a log2-fold change higher than 2 for both groups (Fig. 2C). These analyses identify genes that can be used to distinguish TB and OPD patients, which included CD274, PDCD1LG2 and FCGR1A/B (p-value < 0.0001 and log2-fold change ratio > 2.6) (Fig. 2C).

Figure 2
figure 2

Identification of markers for TB diagnosis. (A) Heatmap of 23 gene expression levels of TB (red) and OPD (yellow) patients. (B) PCA score plot of TB and OPD patients. (C) Volcano plots showing the distribution of the gene expression fold changes in TB patients relative to OPD patients. Genes with absolute fold change ≥ 4 and p-value ≤ 0.05 are indicated in orange. Expression levels are scaled from dark blue (low expression) to dark red (high expression). Figures were generated with R (version 3.6.3) using ComplexHeatmap package (version 2.0.0,, prcomp function from stats package (version 3.6.3,, and NanoStringNorm package (version, for heatmap, pca and Volcano plot, respectively.

Gene expression data of TB and LTBI donors

We also compared the gene expression levels between TB and LTBI groups aiming to identify candidate markers able to differentiate these groups. Both heatmap (Fig. 3A) and PCA analysis (Fig. 3B) show 7 of 594 inflammatory genes that significantly differentiate those groups (p < 0.001 and FC > 2). Volcano plots analyses revealed two promising genes (CCR2 and CIQB, p-value < 0.0001 and log2-fold change ratio > 1.1 and 2.4, respectively) that can be further tested as a possible marker of TB reactivation (Fig. 3C).

Figure 3
figure 3

Identification of markers for TB progression. (A) Heatmap of seven gene expression levels of TB (red) and LTBI (green) subjects. (B) PCA score plot of TB and LTBI subjects. (C) Volcano plot showing the distribution of the gene expression fold changes in TB patients relative to LTBI. Genes with absolute fold change ≥ 4 and p-value ≤ 0.05 are indicated in orange. Expression levels are scaled from dark blue (low expression) to dark red (high expression). Figures were generated with R (version 3.6.3) using ComplexHeatmap package (version 2.0.0,, prcomp function from stats package (version 3.6.3,, and NanoStringNorm package (version, for heatmap, pca and Volcano plot, respectively.

Receiver operating characteristic (ROC) curve analysis

ROC analysis was used to evaluate the individual discriminatory performance of the genes that showed a p-value < 0.001 on the heatmap for the study group’s comparison. The values of area under the curve (AUC), sensitivity, specificity, and the optimal cut-off points for TB diagnostic tests (Table 2) and to differentiate TB and LTBI subjects (Table 3) are shown. CD274, CEACAM1, CR1, FCGR1A/B, IFITM1, IRAK3, LILRA6, MAPK14, PDCD1LG2 genes (all of them presented AUC = 1.0, 100% of sensitivity and specificity) seems promising targets to distinguish TB and OPD patients (Table 2) (see Supplementary Fig. S1). Table 3 presents seven possible candidates to be further evaluated as a predictor of TB progression, including the CCR2, which showed an AUC = 1.0 and both sensitivity and specificity of 100% (see Supplementary Fig. S2).

Table 2 ROC analysis, sensibility and specificity of candidate genes to TB diagnosis.
Table 3 ROC analysis, sensibility and specificity of candidate genes to predict TB reactivation.


The World Health Organization (WHO) identified the need for a non-sputum-based test as a high-priority for TB diagnosis and suggested that a rapid biomarker-based test should be easy to perform and implement at health posts; should increase the number of patients diagnosed with TB; should have sensitivity > 98% among patients with smear-positive, culture-positive, and ≥ 68% for smear-negative and culture-positive pulmonary TB in adults; and the test would ideally be able to diagnose adults and children, and pulmonary TB and extrapulmonary TB alike14. Here, we performed a multiplex gene expression analysis in a single assay for more than 500 inflammatory genes in whole blood samples. By this approach, of all 30 genes herein identified, 23 were candidate targets to diagnose active TB and seven can be validated as biomarkers to distinguish LTBI and TB. All those 30 genes showed sensitivity and specificity > 82%, and ROC AUC > 0.8.

A major challenge to interrupt the TB transmission cycle is to predict when an individual with LTBI will develop active TB. Here, we identified seven genes that were able to discriminate TB patients from LTBI individuals, all presenting high sensitivity and specificity in ROC curve analysis (Table 3). The expression of five (CCRL2, C1QB, C2, LILRB4, and CCR2) of seven genes placed the donor TB8 (TB patient) in the cluster enriched by the LTBI group (Fig. 3). It is possible that the other two genes (MSR1 and MAPK14), which shared a pattern of expression similar to the TB patients, maybe the first set of genes to undergo a change in the level of expression during progression to active TB. To confirm these findings, it is necessary to carry out an evaluation of the expression of these genes in a cohort with LTBI subjects.

We identified 30 candidate genes to be further tested for TB diagnosis and as biomarkers for TB progression. From 23 genes suggested to be suitable for TB diagnosis, ten were related to adaptive immune response, ten were involved in innate immune response, and the other three genes (JAK2, JAK3, and LY96) were not specifically related to either. Conversely, for TB progression, five of seven genes were components of the innate immune system and were increased in TB patients relative to LTBI volunteers (Table 4). These data suggest the involvement of activation of the innate immune response during progression to active TB in latently infected subjects.

Table 4 Annotation of selected genes based on ROC curve analysis.

Previously identified genes that can discriminate TB patient from non-TB patients and TB risk11,13,15,16,17,18,19,20,21,22,23,24,25 either do not fill the minimum sensitivity requirements in adults regardless of HIV status for a POC test (95% in smear-positive culture-confirmed cases and 60–80% in smear-negative culture-confirmed cases), or they proposed gene signatures-based tests which are very difficult to implement. Here, although the number of participants was a limiting issue, we identified single candidate genes for TB diagnosis and progression, all of them presenting high levels of AUC, sensitivity, and specificity.

This study provided valuable information on the development of new diagnostic tests for TB. When validated in a larger population-based study, the expression of the genes herein identified can compose new tools that will overcome the limitations of the currently available diagnostic tests, including low sensibility, long time consuming to perform, and requirement of sputum samples collection. Besides, some of the genes can distinguish seek people with TB from those latently infected. These targets need to be further validated as a possible biomarker to predict TB reactivation in a prospective cohort study.


Study participants

Subjects were recruited between November 2015 to December 2016. Written informed consent was obtained from all participants. Our study included 35 participants, 17 active TB, and 18 controls from which seven were healthy donors with latent M. tuberculosis infection (LTBI), six were uninfected health controls (HC), and five were patients with asthma (OPD). All participants were recruited at the Instituto Brasileiro para Investigação de Tuberculose (IBIT), Bahia, Brazil and 2° Centro de Saúde Rodrigo Argolo, Bahia, Brazil. TB patients were confirmed to have active pulmonary TB by chest X-ray and at least sputum smear microscopy and/or culture positive. Symptomatic patients with sputum smear microscopy negative had TB confirmed by TB culture. TB patients with no sputum smear microscopy and/or culture screened had TB diagnosis by the Xpert MTB-Rif system. The blood sample was collected prior to TB treatment. Household contacts of TB patients were defined as belonging to either LTBI or HC groups, according to QuantiFERON-TB (QFT) Gold In-Tube test. Those with QTF Gold In-tube test negative (cut-off ≤ 0.35 IU/mL) were considered healthy controls while the household contacts with positive results (cut-off > 0.35 IU/mL) were considered LTBI patients. OPD group was composed of patients who sought care with suspected pulmonary TB but were negative to both sputum smear microscopy and culture. Individuals who tested positive for human immunodeficiency virus and patients taking immunosuppressive drugs were excluded. All subjects were between 18 and 65 years old.

RNA isolation

For each donor, we collected 2.5 mL peripheral blood in a PAXgene blood RNA tube (PreAnalytiX). RNA was isolated and purified with the PAXgene Blood RNA kit (Qiagen), according to the manufacturer’s protocol for gene expression analysis by NanoString technology. RNA quantification and quality were assessed by Nanodrop.


We performed gene expression assays at the Molecular Oncology Research Center, Barretos Cancer Hospital, Barretos, Brazil using the NanoString technology with nCounter Immunology Panel that contains 594 targets and 15 internal reference genes. Up to 100 ng of total RNA per sample was used and protocol was performed according to manufacturer’s recommendations. Briefly, RNA was hybridized with reporter and capture probes (NanoString Technologies) and incubated at 67 °C for 21 h. Samples were then loaded onto automated nCounter Prep Station (NanoString, Technologies) for sample purification and immobilization in cartridges. Finally, cartridges were transferred to nCounter Digital Analyzer (NanoString Technologies) to capture image in 280 fields of view (FOVs) providing all gene counts.

Data analysis

The files corresponding to each cartridge were initially analyzed in nSolver Software (NanoString Technologies) for quality control assessment. Then, we analyzed the data in R statistical environment (version 3.6.3)26. Distributions of raw counts were evaluated in quantro package27. Normalization and differential expression were carried out with NanoStringNorm package28. Raw data were normalized with the geometric mean of positive control and housekeeping genes. Hierarchical clustering with Pearson correlation coefficient distance of differentially expressed genes was performed on ComplexHeatmap package29. The ability of genes to discriminate the study groups was evaluated with receiver operating characteristic (ROC) curves and the graphic representation was created by the statistical analysis system GraphPad Prism.

Ethical statement

The study was approved by the Research Ethics Council (CEP) of Maternidade Climério de Oliveira from Universidade Federal da Bahia, CAAE: 48844315.8.0000.5543. Following the basic norms of CEP, Resolution 466/12, all study participants were verbally and in writing informed about the objectives of the study, their participation and IRB contacts and the study coordinator. All participants signed the Consent Form, assuring confidentiality and liberty to leave the study and all methods were performed in accordance with the relevant guidelines and regulations.