An interferon-inducible neutrophil-driven blood transcriptional signature in human tuberculosis

Journal name:
Nature
Volume:
466,
Pages:
973–977
Date published:
DOI:
doi:10.1038/nature09247
Received
Accepted

Tuberculosis (TB), caused by infection with Mycobacterium tuberculosis, is a major cause of morbidity and mortality worldwide. Efforts to control it are hampered by difficulties with diagnosis, prevention and treatment1, 2. Most people infected with M. tuberculosis remain asymptomatic, termed latent TB, with a 10% lifetime risk of developing active TB disease. Current tests, however, cannot identify which individuals will develop disease3. The immune response to M. tuberculosis is complex and incompletely characterized, hindering development of new diagnostics, therapies and vaccines4, 5. Here we identify a whole-blood 393 transcript signature for active TB in intermediate and high-burden settings, correlating with radiological extent of disease and reverting to that of healthy controls after treatment. A subset of patients with latent TB had signatures similar to those in patients with active TB. We also identify a specific 86-transcript signature that discriminates active TB from other inflammatory and infectious diseases. Modular and pathway analysis revealed that the TB signature was dominated by a neutrophil-driven interferon (IFN)-inducible gene profile, consisting of both IFN-γ and type I IFN-αβ signalling. Comparison with transcriptional signatures in purified cells and flow cytometric analysis suggest that this TB signature reflects changes in cellular composition and altered gene expression. Although an IFN-inducible signature was also observed in whole blood of patients with systemic lupus erythematosus (SLE), their complete modular signature differed from TB, with increased abundance of plasma cell transcripts. Our studies demonstrate a hitherto underappreciated role of type I IFN-αβ signalling in the pathogenesis of TB, which has implications for vaccine and therapeutic development. Our study also provides a broad range of transcriptional biomarkers with potential as diagnostic and prognostic tools to combat the TB epidemic.

At a glance

Figures

  1. A distinct whole-blood 393-gene transcriptional signature of active TB.
    Figure 1: A distinct whole-blood 393-gene transcriptional signature of active TB.

    The 393 transcripts differentially expressed in whole blood of patients with active and latent TB and healthy controls. a, Test set. b, Validation set (South Africa) profiles, ordered by hierarchical clustering (Spearman correlation with average linkage) creating a condition tree, upper horizontal edge of heatmap; study grouping (clinical phenotype) are the coloured blocks at each profile base. Heatmap rows, genes; columns, participants. c, Profiles were grouped according to radiographic extent of disease and the median ‘molecular distance to health’ compared between groups (Methods) (Kruskal–Wallis analysis of variance, Dunn’s multiple comparison, P<0.0001). d, Patients with active TB at 0, 2 and 12months after initiation of anti-mycobacterial treatment. The median ‘molecular distance to health’ for each time point was compared (Friedman’s repeated measures test, Dunn’s multiple comparison). Horizontal bars, median, 5th and 95th percentiles.

  2. A distinct whole-blood 86-gene transcriptional signature of active TB is distinct from other diseases.
    Figure 2: A distinct whole-blood 86-gene transcriptional signature of active TB is distinct from other diseases.

    a, Comparison of 86-gene signature in patients with TB and other diseases normalized to their own controls; TB (training, n = 13; control, n = 12), TB (SA, n = 20; control = 12), group A Streptococcus (Strep; n = 23; control = 12), Staphylococcus (Staph; n = 40; control = 12), Still’s disease (Still’s; n = 31; control = 22), Adult (SLE; n = 29; control = 16) and paediatric SLE (pSLE; n = 49; control = 11) patients. b, Expression levels of 86 gene signatures after 2 and 12 months of treatment in patients with TB. (Scale as in Fig. 1.)

  3. Whole-blood transcriptional signature of active TB reflects distinct changes in cellular composition and gene expression.
    Figure 3: Whole-blood transcriptional signature of active TB reflects distinct changes in cellular composition and gene expression.

    a, Gene expression (disease versus healthy controls) of TB (test set) and different diseases mapped within a pre-defined modular framework. Spot intensity (red, increased; blue, decreased) indicates transcript abundance. Functional interpretations previously determined by unbiased literature profiling shown by colour-coded grid. Whole blood (test set patients with active TB and controls) analysed by flow cytometry for (b) CD3+CD4+ T cells, CD3+CD8+ T cells and CD19+CD20+ B cells, and (c) CD14+ monocytes, CD14+CD16+ inflammatory monocytes and CD16+ neutrophils. Error bars, median; **P<0.01, *P<0.05, Mann–Whitney U-test.

  4. Interferon-inducible gene expression in active TB.
    Figure 4: Interferon-inducible gene expression in active TB.

    Canonical pathway of Ingenuity pathways analysis for interferon signalling; symbol indicates gene function (legend on right). Transcripts over-represented in test set patients with active TB shaded red. a, Type II IFN-γ. b, Type I IFN-αβ signalling. Transcript abundance of representative IFN-inducible genes in active TB from (c) whole blood and (d) separated blood leucocyte population. Transcript abundance/expression is normalized to the median of the healthy controls.

Main

Blood transcriptional profiling has improved diagnosis and understanding of disease pathogenesis6, 7, 8, 9. Such a comprehensive unbiased survey will provide insights into the immunopathogenesis of TB, leading to advances in control of this complex disease. Genome-wide transcriptional profiles were generated from blood from patients with active TB (before treatment), patients with latent TB and healthy controls (Supplementary Fig. 1, and Supplementary Tables 1 and 2). A distinct 393-transcript signature was defined in patients with active TB (training set, London), using a combination of expression-level and statistical filters and hierarchical clustering (Supplementary Fig. 2a, b(i), Supplementary Table 3 and Methods). We then applied the 393-transcript list to two independent cohorts (UK test set; South African validation set). Hierarchical clustering of transcriptional profiles showed patients with active TB cluster independently of latent TB and healthy controls, in both intermediate (London) and high-burden (South Africa) regions, with a significant association between cluster and study group (Fisher’s exact test: P = 0.00001365, UK (Fig. 1a); P = 5.79×10−10, South Africa (Fig. 1b)). This was independent of ethnicity, age or gender (Supplementary Fig. 2b(ii), c, d). The transcriptional profiles of 10–25% of patients with latent TB (5/21 test set, 3/31 validation set) clustered with patients with active TB (Fig. 1a, b). The k-nearest neighbour class prediction, using the 393-transcript list, gave a sensitivity of 61.67%, specificity of 93.75% and an indeterminate rate of 1.9% for the test set (Supplementary Table 4), with five patients with latent TB classified as active TB and four patients with active TB misclassified. In the validation set the sensitivity was 94.12%, specificity 96.67% and indeterminate rate 7.8%. The UK patients were of diverse ethnicity, potentially infected with different M. tuberculosis lineages, suggesting the signature may be independent of bacterial clade, although molecular typing was not available. The proportion of latent patients having a transcriptional signature similar to that of active TB was equal to the expected frequency of patients at risk of progression to active disease3, potentially identifying patients with latent TB with sub-clinical active disease or higher burden latent infection.

Figure 1: A distinct whole-blood 393-gene transcriptional signature of active TB.
A distinct whole-blood 393-gene transcriptional signature of active TB.

The 393 transcripts differentially expressed in whole blood of patients with active and latent TB and healthy controls. a, Test set. b, Validation set (South Africa) profiles, ordered by hierarchical clustering (Spearman correlation with average linkage) creating a condition tree, upper horizontal edge of heatmap; study grouping (clinical phenotype) are the coloured blocks at each profile base. Heatmap rows, genes; columns, participants. c, Profiles were grouped according to radiographic extent of disease and the median ‘molecular distance to health’ compared between groups (Methods) (Kruskal–Wallis analysis of variance, Dunn’s multiple comparison, P<0.0001). d, Patients with active TB at 0, 2 and 12months after initiation of anti-mycobacterial treatment. The median ‘molecular distance to health’ for each time point was compared (Friedman’s repeated measures test, Dunn’s multiple comparison). Horizontal bars, median, 5th and 95th percentiles.

Four out of 21 patients with active TB in the test set, also misclassified by class prediction, clustered with healthy controls and patients with latent TB (filled circle, hash symbol, and filled square and diamond in Fig. 1a), demonstrating molecular heterogeneity that could reflect clinical variance. To address this, radiographic extent of disease was assessed by three physicians, blinded to clinical diagnosis and transcriptional profile (Supplementary Fig. 3)10. The median ‘molecular distance to health’11, a composite of the number of transcripts in a profile that significantly differ from the healthy control baseline, and the degree of that difference, was significantly higher for those with advanced disease than for those with minimal or no disease (Fig. 1c). We show for the first time that the transcriptional signature in blood correlates with extent of disease in patients with active TB, and reflects changes at the site of disease. The transcriptional signature was diminished in patients with active TB after 2months, and completely extinguished by 12months after treatment, with ‘molecular distance to health’ at 12months significantly lower than at baseline pretreatment (Fig. 1d and Supplementary Fig. 4), reflecting radiographic improvement. Thus the blood transcriptional signature of patients with active TB could be used to monitor efficacy of treatment, and is reflective of the host response to infection with M. tuberculosis.

The 393-gene active TB signature may reflect common inflammatory responses evoked during many diseases. We therefore identified a TB-specific 86-gene whole-blood signature through analysis of significance12, compared with patients with other bacterial and inflammatory diseases (Supplementary Fig. 5, and Supplementary Tables 5 and 6). This 86-gene signature was then tested against patients normalized to their own controls from seven independent data sets by class prediction (k-nearest neighbours) (Fig. 2a). Sensitivities in the TB training and validation sets were 92% and 90% respectively, distinguishing active TB from other diseases with a pooled specificity of 83% (Supplementary Table 7). As with the 393-gene signature, this 86-gene signature was diminished in response to treatment (Fig. 2b) and reflected the same heterogeneity in identical samples from patients (Supplementary Fig. 6).

Figure 2: A distinct whole-blood 86-gene transcriptional signature of active TB is distinct from other diseases.
A distinct whole-blood 86-gene transcriptional signature of active TB is distinct from other diseases.

a, Comparison of 86-gene signature in patients with TB and other diseases normalized to their own controls; TB (training, n = 13; control, n = 12), TB (SA, n = 20; control = 12), group A Streptococcus (Strep; n = 23; control = 12), Staphylococcus (Staph; n = 40; control = 12), Still’s disease (Still’s; n = 31; control = 22), Adult (SLE; n = 29; control = 16) and paediatric SLE (pSLE; n = 49; control = 11) patients. b, Expression levels of 86 gene signatures after 2 and 12 months of treatment in patients with TB. (Scale as in Fig. 1.)

To identify functional components of the transcriptional host response during active TB, we used a modular data-mining strategy, using sets of genes that are coordinately expressed in different diseases and defined as specific modules, often demonstrating coherent functional relationships through unbiased literature profiling7. The blood modular signature of patients with active TB compared with healthy controls (filtering out only undetected transcripts, α = 0.01, in at least two individuals) was similar in all three TB data sets (Fig. 3a and Supplementary Fig. 7), confirming the reproducibility of the transcriptional signature. The modular TB signature revealed decreased abundance of B-cell (Module, M1.3) and T-cell (M2.8) transcripts and increased abundance of myeloid-related transcripts (M1.5 and M2.6). The largest proportion of transcripts changing in a given module in TB was within the IFN-inducible module (M3.1; 75–82% of IFN-module transcripts (Fig. 3a and Supplementary Fig. 7)). Because a type I IFN-inducible signature, linked with disease pathogenesis, has been demonstrated in peripheral blood mononuclear cells from patients with SLE13, 14, we compared whole-blood modular signatures from patients with other diseases. Patients with SLE demonstrated over-representation of the IFN-inducible module (M3.1 (Fig. 3a) quantified in Supplementary Fig. 8), but displayed a plasma-cell-related module absent in TB (M1.1 (Fig. 3a and Supplementary Fig. 8)). The blood modular signature from patients with group A Streptococcus or Staphylococcus infection, or Still’s disease, showed minimal to no change in the IFN-inducible module (M3.1) but marked over-representation of the neutrophil-related module (M2.2), distinguishing these diseases from TB (Fig. 3a and Supplementary Fig. 8). Thus the IFN-inducible signature is not common to all inflammatory responses, but is preferentially induced during some diseases, potentially reflecting protection or pathogenesis. Although SLE and TB share common inflammatory components such as an IFN-inducible response, the overall pattern of transcriptional changes (Fig. 3a) and their amplitude (Supplementary Fig. 8) distinguishes one disease from another.

Figure 3: Whole-blood transcriptional signature of active TB reflects distinct changes in cellular composition and gene expression.
Whole-blood transcriptional signature of active TB reflects distinct changes in cellular composition and gene expression.

a, Gene expression (disease versus healthy controls) of TB (test set) and different diseases mapped within a pre-defined modular framework. Spot intensity (red, increased; blue, decreased) indicates transcript abundance. Functional interpretations previously determined by unbiased literature profiling shown by colour-coded grid. Whole blood (test set patients with active TB and controls) analysed by flow cytometry for (b) CD3+CD4+ T cells, CD3+CD8+ T cells and CD19+CD20+ B cells, and (c) CD14+ monocytes, CD14+CD16+ inflammatory monocytes and CD16+ neutrophils. Error bars, median; **P<0.01, *P<0.05, Mann–Whitney U-test.

The TB blood-transcriptional signature could represent altered cell composition or changes in gene expression in discrete cellular populations. Percentages of B cells, and of T cells carrying the CD4 and CD8 antigens, assessed by flow cytometry, were significantly diminished in patients with active TB, with reduced numbers of total and central memory T cells carrying the CD4 antigen (Fig. 3b and Supplementary Fig. 9a, b), in keeping with previous studies15. That the reduction in T-cell transcripts revealed by the modular analysis (Fig. 3a) resulted from changes in cell numbers in the blood, was further confirmed because expression of these transcripts in purified T cells from the same individuals did not differ between patients with TB and healthy controls (Supplementary Fig. 9c). In contrast, the increase in myeloid transcripts (M1.5, M2.6 (Fig. 3a and Supplementary Fig. 7)) in the blood of patients with active TB was not accounted for by changes in monocytes (CD14+, CD16) or neutrophils (CD16+, CD14) although inflammatory monocytes (CD14+, CD16+) were increased (Fig. 3c and Supplementary Fig. 10a), as in other diseases16. Increased abundance of myeloid transcripts was less pronounced in purified monocytes (CD14+) (Supplementary Fig. 10b), which suggests involvement of other cells.

Pathway analysis confirmed IFN signalling as the most significantly over-represented pathway in the 393-gene signature (Fisher’s exact test, Benjamini–Hochberg correction for multiple testing, P<0.0000001 (Supplementary Fig. 11)). Genes downstream of both IFN-γ and type I IFN-αβ receptor signalling were significantly over-represented in blood from patients with active TB (Fig. 4a–c). IFN-α2 and IFN-γ proteins were not elevated in serum from patients with active TB, although the IFN-inducible chemokine CXCL10 (IP10) was significantly increased (Supplementary Fig. 11c–e).

Figure 4: Interferon-inducible gene expression in active TB.
Interferon-inducible gene expression in active TB.

Canonical pathway of Ingenuity pathways analysis for interferon signalling; symbol indicates gene function (legend on right). Transcripts over-represented in test set patients with active TB shaded red. a, Type II IFN-γ. b, Type I IFN-αβ signalling. Transcript abundance of representative IFN-inducible genes in active TB from (c) whole blood and (d) separated blood leucocyte population. Transcript abundance/expression is normalized to the median of the healthy controls.

Although IFN-γ is protective during immune responses to intracellular pathogens, including mycobacteria4, 17, 18, the role of type I IFN-αβ is less clear. Type I IFN signalling is crucial for defence against viral infections but may be detrimental during bacterial19, including mycobacterial, infections20, 21. Absence of IFN-αβ signalling in mice improved outcome after infection with highly virulent20, 21, 22, but not less virulent, strains of M. tuberculosis23. Highly virulent strains of M. tuberculosis induce higher levels of type I IFNs20. There are reports of TB reactivation during IFN-α treatment for hepatitis D viral infection24. The increase in type I IFN-αβ-inducible transcripts in the blood of patients with active TB (Fig. 4c), correlating with disease severity, provides the first data in human disease to support a role for type I IFNs in the pathogenesis of TB. These IFN-inducible transcripts were overexpressed in purified blood neutrophils and to a lesser extent monocytes, but not T cells carrying the CD4 and CD8 antigens, from patients with active TB, compared with healthy controls (Fig. 4d; top to bottom: OAS1, IFI6, IFI44, IFI44L, OAS3, IRF7, IFIH1, IFI16, IFIT3, IFIT2, OAS2, IFITM3, IFITM1, GBP1, GBP5, STAT1, GBP2, TAP1, STAT1, STAT2, IFI35, TAP2, CD274, SOCS1, CXCL10, IFIT5). Neutrophils are the predominant cell type infected with rapidly replicating M. tuberculosis in patients with TB25. Evidence from genetically susceptible mice suggests that neutrophils contribute to pathology during infection with M. tuberculosis26. Our studies support a role for neutrophils in the pathogenesis of TB, which may result from over-activation by IFN-γ and type I IFNs.

Earlier microarray studies, limited by small numbers of patients and custom microarrays, reported a small number of genes in blood associated with TB27, 28. Here we provide the first complete description of the human blood transcriptional signature of TB. The signature of active TB, observed in 10–20% of patients with latent TB, may identify those individuals who will develop active disease, facilitating targeted preventative therapy, but longitudinal studies are needed to assess this. That the TB signature is dominated by type I IFN-signalling and reflects extent of lung disease, may indicate the process leading to disease susceptibility. These data improve our understanding of the fundamental biology of TB and may offer future leads for diagnosis and treatment.

Methods

Participant recruitment and patient characterization

The local Research Ethics Committees (REC) at St Mary’s Hospital, London, UK (REC 06/Q0403/128) and the University of Cape Town, Cape Town, South Africa (REC 012/2007) approved the study. All participants were older than 18years and gave written informed consent. Participants were recruited from St Mary’s Hospital and Hammersmith Hospital, Imperial College Healthcare NHS Trust, London, UK, Hillingdon Hospital, the Hillingdon Hospitals NHS Trust, Uxbridge, UK, and the Ubuntu TB/HIV clinic, Khayelitsha, Cape Town, South Africa. Patients were prospectively recruited and sampled before any anti-mycobacterial treatment was started, but only included in the final analysis if they met the full clinical criteria for their relevant study group. A subset of patients with active TB recruited into the first cohort recruited in London was also sampled at 2 and 12months after the start of therapy. Patients who were pregnant, immunosuppressed or who had diabetes or autoimmune disease were ineligible and excluded from this study. In South Africa, all participants had routine HIV testing using the Abbott Determine HIV1/2 rapid antibody assay test kit (Abbott Laboratories), and patients with positive HIV tests were excluded. Patients with active TB were confirmed by laboratory isolation of M. tuberculosis on mycobacterial culture of a respiratory specimen (either sputum or bronchoalveolar lavage fluid) with sensitivity testing performed by the Royal Brompton Hospital Mycobacterial Reference Laboratory, London, UK, or the Reference Labratory of the National Health Laboratory Service, Groote Schuur Hospital, Cape Town, South Africa. In the UK, patients with latent TB were recruited from those referred to the TB clinic with a positive TST, and a positive result using an IGRA. Participants in South Africa with latent TB were recruited from individuals self-referring to the voluntary testing clinic at the Ubuntu TB/HIV clinic, and IGRA positivity alone was used to confirm the diagnosis, irrespective of TST result (although this was still performed). Healthy control participants were recruited from volunteers at the MRC National Institute for Medical Research, Mill Hill, London, UK. To meet the final criteria for study inclusion, healthy volunteers had to be negative by both TST and IGRA.

Tuberculin skin testing

This was performed according to the UK guidelines29 using 0.1ml (2 tuberculin units) tuberculin purified protein derivative (RT23, Serum Statens Institute). A positive TST was termed ≥6mm if BCG (bacille Calmette–Guérin) unvaccinated or ≥15mm if BCG vaccinated, as per the UK national guidelines30.

IFN-γ release assay testing

The QuantiFERON Gold In-Tube Assay (Cellestis) was performed according to the manufacturer’s instructions.

Total and differential leucocyte counts

Two millilitres of whole blood were collected into Terumo Venosafe 5ml K2-EDTA tubes (Terumo Europe). Samples were then analysed within 4h using the Nihon Kohden MEK-6400 Automated Hematology Analyser (Nihon Kohden).

Assessment of radiographic extent of disease

Plain chest radiographs were obtained for all patients recruited in London as digital images and graded by three independent clinicians, blinded to the transcriptional profiles and the clinical data, using a modified version of the classification system of the US National Tuberculosis and Respiratory Disease Association10. This system characterizes the radiographic extent of disease into ‘minimal’, ‘moderately advanced’ or ‘far advanced’ stages, according to criteria based upon the density and extent of lesions and presence of absence of cavitation. We modified the system for use in our study so that it also included a classification of ‘No disease’, and accounted for the presence of pleural disease or lymphadenopathy. The system was then converted into a decision tree to aid classification (Supplementary Fig. 3a).

RNA sampling, extraction and processing for microarray analysis

Three millilitres of whole blood were collected into Tempus tubes (Applied Biosystems), vigorously mixed immediately after collection, and stored between −20 and −80°C before RNA extraction. RNA was isolated from training set samples using 1.5ml whole blood and the PerfectPure RNA Blood Kit (5 PRIME). Test and validation (South Africa) set samples were extracted from 1ml of whole blood using the MagMAX-96 Blood RNA Isolation Kit (Applied Biosystems/Ambion) according to the manufacturer’s instructions. Two and a half micrograms of isolated total RNA was then globin-reduced using the GLOBINclear 96-well format kit (Applied Biosystems/Ambion) according to the manufacturer’s instructions. Total and globin-reduced RNA integrity was assessed using an Agilent 2100 Bioanalyser showing a quality of RNA integrity number of 7–9.5 (Agilent Technologies). RNA yield was assessed using a NanoDrop 1000 spectrophotometer (NanoDrop Products, Thermo Fisher Scientific). Biotinylated, amplified antisense complementary RNA (cRNA) targets were then prepared from 200 to 250ng of the globin-reduced RNA using the Illumina CustomPrep RNA amplification kit (Applied Biosystems/Ambion). Seven hundred and fifty nanograms of labelled cRNA was hybridized overnight to Illumina Human HT-12 V3 BeadChip arrays (Illumina), which contained more than 48,000 probes. The arrays were then washed, blocked, stained and scanned on an Illumina BeadStation 500 following the manufacturer’s protocols. Illumina BeadStudio version 2 software (Illumina) was used to generate signal intensity values from the scans.

Separated cells isolation and RNA extraction

Whole blood was collected in EDTA. Neutrophils (CD15+), monocytes (CD14+) and T cells carrying the CD4 and CD8 antigens were isolated sequentially using Dynabeads according to manufacturer’s instructions. RNA was extracted from whole blood (5′ Prime PerfectPure Kit) or separated cell populations (Qiagen RNeasy Mini Kit) and stored at −80°C until use.

Microarray data analysis

For normalization, Illumina BeadStudio version 2 software was used to subtract background and scale average signal intensity for each sample to the global average signal intensity for all samples. A gene expression analysis software program, GeneSpring GX version 7.3.1 (Agilent Technologies, hereafter referred to as GeneSpring), was used to perform further normalization. All signal intensity values less than 10 were set to equal 10. Next, per-gene normalization was applied, by dividing the signal intensity of each probe in each sample by the median intensity for that probe across all samples except for Fig. 4c, d and Supplementary Figs 9c and 10b, where signals are normalized to the median of each control group. These normalized data were used for all downstream analyses except the assessment of molecular distance to health detailed below.

Using GeneSpring, all transcripts were filtered first to select detected transcripts: those called ‘present’ in greater than 10% of all samples. Present calls were selected if the signal precision was less than 0.01. The remaining transcripts were filtered to select the most variable probes: those that had a minimum of twofold expression change compared with the median intensity across all samples, in greater than 10% of all samples.

We next performed unsupervised analysis, using hierarchical clustering and class discovery. This approach aims to create an unbiased grouping of samples on the basis of their molecular profiles, independently of any other phenotypic or clinical classification. Transcripts meeting the filtering criteria are then subjected to hierarchical clustering using GeneSpring. For hierarchical clustering of genes, we used a clustering algorithm based upon Pearson correlation, creating a vertical dendrogram of genes, where transcripts with a similar expression pattern across all samples are grouped together. The distances between branches of the tree relate to the similarity of the expression patterns, and the distance between clusters is determined by the average of the distance between all points in each cluster, known as average linkage. The vertical expression profiles so generated can then be subjected to the same hierarchical clustering algorithm, now grouping individual participants into horizontally presented clusters on the basis of the similarity of their expression profiles. For this stage, we base the clustering algorithm on Spearman’s rank correlation. By examining the cluster membership we can assess both whether the samples are grouping according to known factors (clinical diagnosis, demographic features) and discover if there are unknown subclasses within the data set.

Supervised analysis was performed using statistical filtering and class comparison. The aim of the supervised analysis is to identify transcripts that are differentially expressed between study groups and that might serve as classifiers or yield insight into immunopathogenesis: that is, class comparison. The filtered list of transcripts generated for unsupervised analysis was used as the starting point for the supervised analysis: that is, those transcripts that were both detected and had at least a twofold change in expression compared with the median, in greater than 10% of all samples. Using GeneSpring, these transcripts were then tested using the Kruskal–Wallis test for comparisons across all study groups, with α = 0.01. Adjustment for multiple testing was applied using the Benjamini–Hochberg false discovery rate set at 1%. Lists of transcripts generated in this way were then used for hierarchical clustering as described above. Interpretation of functional roles of individual transcripts was established by searching the database at the National Center for Biotechnology Information gene database at http://www.ncbi.nlm.nih.gov/sites/entrez?db = gene.

For class prediction, we used one of the tools available within GeneSpring. The prediction model used the k-nearest neighbours algorithm, with 10 neighbours and a P value ratio cutoff of 0.5. All genes from the 393 transcript list were used for the prediction. The prediction model was refined by cross-validation on the training set, with the one active outlier excluded. This model was then used to predict the classification of the samples in the independent test and validation sets. Where no prediction was made, this was recorded as an indeterminate result. Sensitivity, specificity and 95% confidence intervals were determined using GraphPad Prism version 5.02 for Windows. P values were determined using two-sided Fisher’s exact test.

‘Molecular distance to health’ was performed as previously described11. It aims to convert transcript abundance values into a representative score indicating the degree of transcriptional perturbation of a given sample compared with a healthy baseline. This is performed by determining whether the expression values of a given sample lie inside or outside two standard deviations from the mean of the healthy controls.

Additional functional analysis of differentially expressed genes was performed using Ingenuity pathways analysis (Ingenuity Systems, http://www.ingenuity.com). Canonical pathways analysis identified the pathways from the Ingenuity pathways analysis that were most significantly represented in the data set. The significance of the association between the data set and the canonical pathway was measured using Fisher’s exact test to calculate a P value representing the probability that the association between the transcripts in the data set and the canonical pathway was explained by chance alone, with a Benjamini–Hochberg correction for multiple testing applied. The program can also be used to map the canonical network and overlay it with expression data from the data set.

Transcriptional modular analysis was performed as described previously7, 11. In the context of the present study, because the modular framework was derived using Affymetrix HG U133A&B GeneChips, it was necessary to translate the probes comprising the modules into their equivalents on the Illumina platform. RefSeq identities were used to match probes between the Affymetrix HG U133 and Illumina HT-12 V3 platforms. Unambiguous matches were found for 2,071 out of the 5,348 Affymetrix probe sets, and these were used in the present modular analysis. The matching probes were preserved in their original modules. To present the global transcriptional changes graphically, for the disease group as a whole versus the healthy control group as a whole, spots are aligned on a grid, with each position corresponding to a different module based on their original definition. Spot intensity indicates the percentage of differentially expressed transcripts changing in the direction shown, from the total number of transcripts detected for that module, whereas spot colour indicates the polarity of the change (red, overrepresented; blue, underrepresented).

Significance analysis was performed as previously described12. Transcriptional changes in whole blood were evaluated through statistical group comparison performed systematically for active TB (test set), Staphylococcus infection, Still’s disease, and adult and paediatric SLE versus their respective healthy controls, which allowed the normalization of each disease group to its own matched healthy control group, thus avoiding biological or technical confounding factors. A TB-specific whole-blood signature composed of 86 genes was identified (Supplementary Fig. 5 and Supplementary Tables 5 and 6, P<0.01) that was not in the four other data sets (P>0.05) using a Mann–Whitney U-test with Benjamini–Hochberg false discovery rate correction for multiple testing. Class prediction was performed using k-nearest neighbours algorithm, as before.

Multiplex serum protein measurement

One to four millilitres of blood were collected into serum clot activator tubes (either Greiner BioOne 1-ml vacuette tubes, reference 454098, Greiner BioOne; or BD 4-ml vacutainer tubes, reference 368975; Becton Dickinson). Tubes were centrifuged at 2000g for 5min at room temperature and the serum portion extracted and frozen at –80°C pending analysis. Analysis was performed by multiplexed cytokine bead-based immunoassay by Millipore UK using the Milliplex Multi-Analyte Profiling system. The serum levels of 63 cytokines, chemokines, soluble receptors, growth factors, adhesion molecules and acute phase proteins were measured in this way in each sample. Samples were assayed for levels of MMP-9, C-reactive protein, serum amyloid A, EGF, eotaxin, FGF-2, Flt-3 ligand, fractalkine, G-CSF, GM-CSF, GRO, IFN-α2, IFN-γ, IL-10, IL-12p40, IL-12p70, IL-13, IL-15, IL-17, IL-1α, IL-1β, IL-1Ra, IL-2, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, CXCL10 (IP10), MCP-1, MCP-3, MIP-1 α, MIP-1β, PDGF-AA, PDGF-AB/BB, RANTES, soluble CD40 ligand, soluble IL-2RA, TGF-α, TNF-α, VEGF, MIF, soluble Fas, soluble Fas ligand, tPAI-1, soluble ICAM-1, soluble VCAM-1, soluble CD30, soluble gp130, soluble IL-1RII, soluble IL-6R, soluble RAGE, soluble TNF-RI, soluble TNF-RII, IL-16, TGF-β1, TGF-β2 and TGF-β3.

Flow cytometry

Two hundred microlitres of whole blood (collected in Sodium-Heparin tubes) per staining panel were incubated with the appropriate antibodies for 20min at room temperature in the dark. Red blood cells were then lysed using BD FACS lysing solution (BD Biosciences), incubating for 10min at room temperature in the dark. Cells were spun down and washed in 2ml FACS buffer (PBS/BSA/azide) before being fixed in 1% paraformaldehyde. Samples were then run on a Beckman Coulter Cyan using Summit software version 3.02. Analysis was performed using FlowJo version 8.7.3 for Macintosh (Tree Star). Gating strategies used are set out in Supplementary Figs 9a and 10a. Flow cytometric data are presented as dot plots (Fig. 3 and Supplementary Fig. 9b). Where appropriate, pooled flow cytometry data were tested for significance using the Mann–Whitney rank sum U-test. All antibodies were purchased from BD Pharmingen or Caltag Laboratories (Invitrogen) except for CD45RA, which was purchased from Beckman Coulter.

Statistical analysis

Molecular distance to health and modular framework analysis calculations were performed using Microsoft Excel 2003. Statistical analysis of continuous variables and correlation analysis was performed using GraphPad Prism version 5.02 for Windows (GraphPad Software). Analysis of categorical variables was performed using SPSS version 14 for Windows.

Accession codes

References

  1. Dye, C., Floyd, K. & Uplekar, M. in Global Tuberculosis Control: Surveillance, Planning, Financing Ch. 1, 1737 (World Health Organization, 2008)
  2. Kaufmann, S. H. & McMichael, A. J. Annulling a dangerous liaison: vaccination strategies against AIDS and tuberculosis. Nature Med. 11, S33S44 (2005)
  3. Barry, C. E., III et al. The spectrum of latent tuberculosis: rethinking the biology and intervention strategies. Nature Rev. Microbiol. 7, 845855 (2009)
  4. Cooper, A. M. Cell-mediated immune responses in tuberculosis. Annu. Rev. Immunol. 27, 393422 (2009)
  5. Young, D. B., Perkins, M. D., Duncan, K. & Barry, C. E., III Confronting the scientific obstacles to global control of tuberculosis. J. Clin. Invest. 118, 12551265 (2008)
  6. Ardura, M. I. et al. Enhanced monocyte response and decreased central memory T cells in children with invasive Staphylococcus aureus infections. PLoS ONE 4, e5446 (2009)
  7. Chaussabel, D. et al. A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus. Immunity 29, 150164 (2008)
  8. Pascual, V., Chaussabel, D. & Banchereau, J. A genomic approach to human autoimmune diseases. Annu. Rev. Immunol. 28, 535571 (2010)
  9. Ramilo, O. et al. Gene expression patterns in blood leukocytes discriminate patients with acute infections. Blood 109, 20662077 (2007)
  10. Falk, A. & O’Connor, J. B. in Diagnosis Standards and Classification of Tuberculosis, vol. 12 (eds Falk, A. et al.), 6876 (National Tuberculosis and Respiratory Disease Association, 1969)
  11. Pankla, R. et al. Genomic transcriptional profiling identifies a candidate blood biomarker signature for the diagnosis of septicemic melioidosis. Genome Biol. 10, R127.1R127.22 (2009)
  12. Allantaz, F. et al. Blood leukocyte microarrays to diagnose systemic onset juvenile idiopathic arthritis and follow the response to IL-1 blockade. J. Exp. Med. 204, 21312144 (2007)
  13. Baechler, E. C. et al. Interferon-inducible gene expression signature in peripheral blood cells of patients with severe lupus. Proc. Natl Acad. Sci. USA 100, 26102615 (2003)
  14. Bennett, L. et al. Interferon and granulopoiesis signatures in systemic lupus erythematosus blood. J. Exp. Med. 197, 711723 (2003)
  15. Beck, J. S., Potts, R. C., Kardjito, T. & Grange, J. M. T4 lymphopenia in patients with active pulmonary tuberculosis. Clin. Exp. Immunol. 60, 4954 (1985)
  16. Auffray, C., Sieweke, M. H. & Geissmann, F. Blood monocytes: development, heterogeneity, and relationship with dendritic cells. Annu. Rev. Immunol. 27, 669692 (2009)
  17. Casanova, J. L. & Abel, L. Genetic dissection of immunity to mycobacteria: the human model. Annu. Rev. Immunol. 20, 581620 (2002)
  18. Flynn, J. L. & Chan, J. Immunology of tuberculosis. Annu. Rev. Immunol. 19, 93129 (2001)
  19. Decker, T., Muller, M. & Stockinger, S. The yin and yang of type I interferon activity in bacterial infection. Nature Rev. Immunol. 5, 675687 (2005)
  20. Manca, C. et al. Hypervirulent M. tuberculosis W/Beijing strains upregulate type I IFNs and increase expression of negative regulators of the Jak-Stat pathway. J. Interferon Cytokine Res. 25, 694701 (2005)
  21. Ordway, D. et al. The hypervirulent Mycobacterium tuberculosis strain HN878 induces a potent TH1 response followed by rapid down-regulation. J. Immunol. 179, 522531 (2007)
  22. Manca, C. et al. Virulence of a Mycobacterium tuberculosis clinical isolate in mice is determined by failure to induce Th1 type immunity and is associated with induction of IFN-alpha/beta. Proc. Natl Acad. Sci. USA 98, 57525757 (2001)
  23. Cooper, A. M., Pearl, J. E., Brooks, J. V., Ehlers, S. & Orme, I. M. Expression of the nitric oxide synthase 2 gene is not essential for early control of Mycobacterium tuberculosis in the murine lung. Infect. Immun. 68, 68796882 (2000)
  24. Telesca, C. et al. Interferon-alpha treatment of hepatitis D induces tuberculosis exacerbation in an immigrant. J. Infect. 54, e223e226 (2007)
  25. Eum, S. Y. et al. Neutrophils are the predominant infected phagocytic cells in the airways of patients with active pulmonary tuberculosis. Chest 137, 122128 (2010)
  26. Eruslanov, E. B. et al. Neutrophil responses to Mycobacterium tuberculosis infection in genetically susceptible and resistant mice. Infect. Immun. 73, 17441753 (2005)
  27. Jacobsen, M. et al. Candidate biomarkers for discrimination between infection and disease caused by Mycobacterium tuberculosis . J. Mol. Med. 85, 613621 (2007)
  28. Mistry, R. et al. Gene-expression patterns in whole blood identify subjects at risk for recurrent tuberculosis. J. Infect. Dis. 195, 357365 (2007)
  29. Salisbury, D., Ramsay, M. & Noakes, K. in Immunization against Infectious Disease 3rd edn 391408 (HMSO, 2006)
  30. National Institute for Health and Clinical Excellence. Tuberculosis. Clinical Diagnosis and Management of Tuberculosis, and Measures for its Prevention and Control (Royal College of Physicians, 2006)

Download references

Acknowledgements

We thank the patients and volunteer participants. We thank D. Kioussis (MRC National Institute for Medical Research (NIMR)) and D. Young (NIMR) for discussion and input. We thank N. Baldwin (Baylor Institute for Immunology Research (BIIR)) for advice and support on bioinformatics analysis, Q.-A. Nguyen (BIIR) and colleagues for providing technical assistance with microarray processing, and S. Caidan (NIMR), J. Wills (NIMR) and S. Phillips (BIIR) for help and advice with sample storage and transport. We thank the TB service at Imperial College Healthcare NHS Trust, B.M. Haselden and the TB service at Hillingdon Hospital, Uxbridge, UK. We also thank H. Giedon and R. Seldon for help in laboratory analyses, and Y. Hlombe for recruitment of patients and follow-up in South Africa. A. Rae (NIMR), T. Dipucchio (BIIR) and K. Palucka (BIIR) provided advice on flow cytometry. We thank G. Hayward for help depositing the microarray data. We thank J. Brock (NIMR) for help with graphics. M.P.R.B. was supported by an MRC career development fellowship and a grant from the Dana Foundation Program in Human Immunology. The research was funded by the Medical Research Council, UK, MRC Grant U117565642 and The Dana Foundation Program in Human Immunology. A.O’G., C.M.G. and F.W.McN. are funded by the Medical Research Council, UK. V.P. is supported by National Institutes of Health (NIH) R01 AR050770-01, NIH P50 ARO54083 and NIH 1 U19 AI082715-01. The work of J.B., D.C. and V.P. is supported by the Baylor Health Care System Foundation and the NIH (U19 AIO57234-02, U01 AI082110, P01 CA084512).

Author information

  1. These authors contributed equally to this work.

    • Christine M. Graham &
    • Finlay W. McNab

Affiliations

  1. Division of Immunoregulation, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK

    • Matthew P. R. Berry,
    • Christine M. Graham,
    • Finlay W. McNab &
    • Anne O’Garra
  2. Division of Mycobacterial Research, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK

    • Katalin A. Wilkinson &
    • Robert J. Wilkinson
  3. Department of Respiratory Medicine, St. Mary’s Hospital, Imperial College Healthcare NHS Trust, London W2 1NY, UK

    • Susannah A. A. Bloch &
    • Onn M. Kon
  4. Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Observatory 7925, South Africa

    • Tolu Oni,
    • Katalin A. Wilkinson &
    • Robert J. Wilkinson
  5. Division of Medicine, Wright Fleming Institute, Imperial College, London, W2 1PG, UK

    • Tolu Oni &
    • Robert J. Wilkinson
  6. Baylor Institute for Immunology Research/ANRS Center for Human Vaccines, INSERM U899, 3434 Live Oak Street, Dallas, Texas 75204, USA

    • Zhaohui Xu,
    • Jason Skinner,
    • Charles Quinn,
    • John J. Cush,
    • Virginia Pascual,
    • Jacques Banchereau &
    • Damien Chaussabel
  7. Institute for Health Care Research and Improvement, Baylor Health Care System, Dallas, Texas 75206, USA

    • Derek Blankenship
  8. Department of Radiology, St Mary’s Hospital, Imperial College Healthcare NHS Trust, London W2 1NY, UK

    • Ranju Dhawan
  9. UT Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390, USA

    • Romain Banchereau
  10. Center for Vaccines and Immunity, The Research Institute at Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, Ohio 43205, USA

    • Asuncion Mejias &
    • Octavio Ramilo

Contributions

M.P.R.B., D.C., O.M.K and A.O’G. designed the study on TB with input from J.B. and R.J.W. and for other diseases with input from V.P. and O.R.; M.P.R.B., S.A.A.B., T.O., K.A.W., J.J.C., A.M., R.B. and O.M.K. recruited, sampled and collected data about patients; M.P.R.B., R.B., A.M. and C.M.G. processed whole blood for microarray experiments with help from J.S.; C.G. performed blood-cell subset separations and processing for microarray experiments with help from J.S.; M.P.R.B., C.M.G. and Z.X. performed microarray data analysis, with advice and input from J.S., D.C. and V.P.; M.P.R.B. and Z.X. performed Ingenuity, modular and ‘molecular distance to health’ analyses; M.P.R.B. performed multiplex serum analyses; F.W.McN. performed flow cytometry analysis; D.C., V.P. and A.O’G. supervised data analysis; M.P.R.B. and D.B. performed statistical analysis; M.P.R.B., S.A.A.B., R.D. and O.M.K performed analyses of radiology; A.O’G. and M.P.R.B. wrote the manuscript, with early input from C.M.G., F.W.McN., J.B., D.C. and J.S., and subsequently all authors provided advice and approved the final manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

All microarray data are deposited in GEO under accession numbers GSE19491, GSE19444, GSE19443, GSE19442, GSE19439, GSE19435 and GSE 22098. Some of the work has been submitted as US patent application PCT 371: Blood Transcriptional Signature of Mycobacterium Tuberculosis Infection: Serial No: 12/602,488.

Author details

Supplementary information

PDF files

  1. Supplementary Information (26.8M)

    This file contains Supplementary Figures 1-11 with legends and Supplementary Tables 1.2 .4 and 7 (see separate files for Supplementary Tables 3, 5 and 6).

Excel files

  1. Supplementary Table 3 (85K)

    This table contains 393-transcript list.

  2. Supplementary Table 5 (25K)

    This table contains patient details.

  3. Supplementary Table 6 (31K)

    This table contains 86-transcript list.

Additional data