Introduction

Tuberculosis (TB), caused by mycobacterium tuberculosis (Mtb) infection, is one of the leading causes of death from infectious diseases. World Health Organization (WHO) in 2021 has proposed the “End TB Strategy” to reduce the incidence of TB by 90% between 2015 and 20351. Hence, it needs more effort to realize this target. The related mechanisms of Mtb infection included subversion of expression of key microRNAs (miRNAs) involved in the regulation of host innate and adaptive immune response against Mtb2. In addition, the pathological effects also included induction of necrosis, NOD2 signaling, type I interferon production, and autophagy3. Latent TB infection and active TB are the two main TB-related statuses. The evolution between TB infection and active TB is multifactorial and the synthesis of ESAT-6 or the induction of alveolar macrophage necrosis are key4. About 5–10% of latent TB infection patients will progress to active disease in the first years after primary infection and, despite using the recommended treatment, 20% can still reactivate the infection5. Hence, early diagnosis and prevention of latent TB is an important healthcare management strategy to control TB progression.

The strategy for controlling TB has been focused on the development of laboratory tests that can be used to identify those with latent TB who are at risk for developing active TB. The tuberculin skin test (TST) and interferon-gamma release assays (IGRAs) are widely used to evaluate the host immune response and diagnose TB6. But their applications are subjected to limitations due to the high cost, insufficient convenience, and technological complexity7. Presently, some studies have proposed the use of more basic laboratory measures. Naranbhai et al. found that elevated monocytes to lymphocytes (ML) ratio at 3–4-months old was associated with increased hazards of TB disease before two years among children8. They also indicated the significant correlation between ML ratio and TB risk among HIV-infected adults9,10. Chedid et al. study indicated that high white blood cell counts and low lymphocyte proportions at baseline were significantly associated with the risk of TB treatment failure11. Zhang et al. study suggested that patients with obvious weight loss and relatively lower white blood cell count have a larger TB infection risk12. Rees et al. study showed that red blood cell decrease was associated with the development of TB infection13. The important role of basic laboratory measures in TB has been paid more and more attention. Although the importance of several laboratory indicators on TB risk has been assessed, it is necessary to find more valuable markers for controlling TB progression. In addition, few studies focused on the prognostic value of laboratory results in TB.

In this study, we used the machine learning LASSO model and logistic regression analyses to detect the influencing factors associated with TB risk among 23 laboratory test results using data from National Health and Nutrition Examination Surveys (NHANES 2011–2012). By matching the corresponding mortality data of samples in the National Death Index (NDI), we further evaluated the prognostic value of important laboratory test results. This study aimed to find significant factors associated with both TB risk and mortality of individuals with TB.

Methods

Data source and study population

This study is a cross-sectional analysis to evaluate the important factors associated with TB infection and the survival of individuals. Related information about TB infection and variables were obtained from the National Health and Nutrition Examination Survey (NHANES) (https://www.cdc.gov/nchs/nhanes/). The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel. The 2011–2012 NHANES cycle included the recent TB testing for all participants ≥ 6 years old. In this study, we excluded the participants under the age of 18 and samples without QuantiFERON-TB Gold-In-Tube (QFT) and tuberculin skin testing (TST) information.

The corresponding survival data of individuals with TB were obtained from National Death Index (NDI) (https://www.cdc.gov/nchs/ndi/) which is linked to the NHANES surveys and contains over 100 million death records. The survey protocol for NHANES data collection is approved by the NCHS institutional review board. The Ethics Committee has confirmed that the data can be analyzed and published with waver of consent from the individuals included. The data used in this study were anonymised before its use.

Definitions

To determine the TB infection, NHANES participants 6 years of age and older were skin tested with a tuberculin-purified protein derivative (PPD) product, tubersol, a commercially available antigen. Additionally, these NHANES participants were secondarily screened with an FDA-approved IGRA blood test, QuantiFERON-TB Gold In Tube test (QFT-GIT), for TB infection. But the TST has been found to have lower specificity and lower positive predictive value than QFT-GIT. Therefore, in this study we used only the results from the QFT-GIT for the TB infection.

Detailed description of laboratory methodology for QFT-GIT was described elsewhere14. The QFT-GIT system uses specialized blood collection tubes, which are used to collect whole blood via venipuncture including a Nil control tube, TB Antigen tube, and a Mitogen tube (positive control)15 . The tubes are shaken to mix antigen with the whole blood and incubated at 37 °C + 1 °C for 16 to 24 h16. Following the incubation period, plasma is harvested and the amount of IFN-γ produced in response to the peptide antigens is measured by ELISA17. Results for the test samples are reported in International Units (IU) relative to a standard curve prepared by testing dilutions of a recombinant human IFN-γ standard18.

In this study, individuals were divided into QFT-GIT negative and positive groups. The following criteria were used to interpret the positive QFT-GIT according to the NHANES guidelines, 1) Nil value must be ≤ 8.0 IU gamma interferon (IF)/ml, and 2) TB antigen value minus Nil value must be ≥ 0.35 IU gamma interferon (IF)/ml, and 3) TB antigen value minus Nil value must be ≥ 25% of the Nil value. Individuals with missing or indeterminate QFT-GIT results were excluded. A low response to mitogen (< 0.5 IU/mL) indicates an indeterminate result when a blood sample also has a negative response to the TB antigens.

Variables

The QFT-GIT result and survival status were chosen as dependent variables. In addition, 36 variables were included in the study. The qualitative variables included gender, age, race, education level, marital status, whether lived in household TB sick person, BMI (kg/m2), tuberculin skin test (TST) result, history of TB exposure, application of TB medicine, arthritis classification, smoking status, alcohol use, whether diabetes. TST was performed with 0.1 mL of tuberculin antigen (Tubersol), read by NHANES staff 46–74 h after placement. The positive TST in this study was defined as induration size ≥ 10 mm (commonly used for adults in the US, except for individuals with special risks). Study protocol dictated that at least two separate readers, blinded to each other’s measurements, would measure TST reactions of > 25% of participants. Readers worked in separate rooms and recorded measurements in a computer database; measurements recorded on the first screen were not accessible to subsequent readers.

We also enrolled 23 laboratory test results, including white blood cell count, albumin, sodium, lymphocyte percent, monocyte percent, neutrophils percent, eosinophils percent, basophils percent, lymphocyte number, monocyte number, neutrophils num, eosinophils number, basophils number, red blood cell count, hemoglobin, hematocrit, platelet count, monocyte-to-lymphocyte ratio (MLR), platelets-to-monocyte ratio (PMR), platelets-to-lymphocyte ratio (PLR), neutrophil-to-lymphocyte ratio (NLR), platelets-to-neutrophil ratio (PNR), and prognostic nutritional index (PNI).

Statistical analysis

We merged the candidate records by corresponding sequence number. Qualitative variables were presented as frequency and the differences between groups were compared by χ2 test. Quantitative variables were firstly assessed the normal distribution and presented as means ± standard error (SE). The differences on quantitative variables were compared by Bonferroni test. For all analyses, P < 0.05 was considered statistically significant.

We first used the “glmnet” R package to fit the Least Absolute Shrinkage and Selection Operator (LASSO) model to screen latent variables. The QFT-GIT result (positive vs negative) was set as the dependent variable, and all the laboratory test results as independent variables were included in the LASSO model. The ten-fold cross-validation was used to select the penalty term lambda (λ). LASSO analysis minimized the insignificant coefficients to 0 and the nonzero variables were selected for further analysis. We next constructed machine learning model to predict the weight importance of nonzero variables. The model parameters included 1) fold number for cross-validation was 5; 2) maximum iterations was 1000; 3) convergence metric was 0.0001; and 4) alpha was 1.86e-04.

Further, we performed a univariable logistic regression analysis to evaluate the correlation between nonzero variables and QFT-GIT result. Then the variables with P < 0.05 in univariable logistic regression analysis were enrolled into the multivariable logistic regression to select the independent factors. Regarding the independent factors, we performed the logistic regression analyses to explore the correlation of these variables with QFT-GIT result after adjusting potential confounder. Further, we constructed a comprehensive nomogram model incorporating above independent factors and significant clinical characteristics to evaluate the possibility for predicting positive QFT-GIT. Decision Curve (DCA) analysis was used to assess the clinical net benefit, and receiver operating characteristic (ROC) analysis in training and validation sets was used to evaluate the prediction performance of comprehensive model. We also performed the ROC and DCA analyses to evaluate the performance of comprehensive model about death risk in individuals with positive QFT-GIT. Finally, the correlation between independent factors and death risk among individuals with positive QFT-GIT was assessed by logistic regression analyses with adjustment of potential confounders.

Results

Baseline characteristics of individuals

The NHANES 2011–2012 cycle contained 9756 samples. A total of 5256 participants aged 18–85 years who had QFT-GIT results were included in this study. Among them, 521 (9.9%) individuals had positive QFT-GIT results. The baseline sociodemographic characteristics stratified by QFT-GIT status were shown in Table 1.

Table 1 The baseline characteristics of qualitative variables stratified by age and QFT-GIT results.

In 18–34 age group, the QFT-GIT results in different education level (P = 0.018) and TB medicine groups (P = 0.039) showed significant difference. Both in 35–64 and 65–85 age groups, the significant difference of QFT-GIT results can be found in different gender, race, education level, lived in household TB sick person status, and history of TB exposure (all P < 0.05). In addition, TB medicine (P = 0.015) and alcohol use (P = 0.001) were related to QFT-GIT results in 35–64 age group. Arthritis types (P = 0.011) and smoking status (P = 0.006) were related to QFT-GIT results in 65–85 age group. Among 3 age groups, marital status, BMI, and diabetes were not associated with QFT-GIT results (all P > 0.05).

The comparison analysis on 23 laboratory results (Table 2) showed no significant difference between positive and negative QFT-GIT groups regarding monocyte percent, basophils percent, lymphocyte number, eosinophils number, basophils number, red blood cell count, hematocrit, PMR, PNR, and PNI (all P > 0.05). The remaining 11 laboratory test results presented a significant difference between the 2 groups (all P < 0.05).

Table 2 The baseline characteristics of quantitative variables stratified by QFT-GIT results (means ± SE).

Association of laboratory variables with QFT-GIT results

We then performed the LASSO analysis to filter the important laboratory variables associated with QFT-GIT results. Figure 1A showed the coefficients of 23 laboratory results in LASSO regression. The optimal λ value was determined by using ten-fold cross-validation (Fig. 1B). Finally, 11 variables with nonzero coefficients were identified with the optimal λ of 0.003, which included white blood cell count, albumin, sodium, eosinophils count, neutrophils number, red blood cell count, platelet count, monocyte-to-lymphocyte ratio (MLR), platelets-to-lymphocyte ratio (PLR), neutrophil-to-lymphocyte ratio (NLR), and prognostic nutritional index (PNI).

Figure 1
figure 1

Variables selection associated with positive QFT-GIT using LASSO regression analysis. (A) LASSO coefficient of 23 laboratory variables. (B) Ten-fold cross-validation for tuning parameter selection in the LASSO model. (C) Weight importance evaluation of nonzero variables.

Further, we used the machine learning model based on LASSO analysis to determine the weight importance of nonzero variables (Fig. 1C). According to each variable's contribution by the machine learning model, monocyte-to-lymphocyte ratio, neutrophils number, white blood cell count, red blood cell count, and eosinophils count were the top five most important variables. Especially, the monocyte-to-lymphocyte ratio (MLR) showed the most significant association with QFT-GIT results.

We then explored the correlation between 11 variables with nonzero coefficients and QFT-GIT results by logistic regression analyses (Table. 3). The univariable logistic regression analysis showed that 9 variables were significantly related to the QFT-GIT results (all P < 0.05). We then enrolled the 9 variables into the multivariable analysis, indicating that sodium (OR = 1.050, P = 0.024) and MLR (OR = 0.188, P = 0.025) were independent factors of QFT-GIT results. With per 1 mmol/L sodium increase, the risk of positive QFT-GIT increased 1.050 times. With per MLR increase, the risk of positive QFT-GIT increased 0.188 times.

Table 3 Correlation between 11 laboratory variables and QFT-GIT by logistic regression analyses.

This study further used the multivariable logistic regression analyses to estimate OR of 2 independent factors for QFT-GIT results after adjusting for potential confounders (Table. 4). The results indicated that MLR rather than sodium showed significant correlation with positive QFT-GIT (OR = 0.112, P < 0.001) in model 1. When this model was further adjusted for other confounders, the correlation of higher sodium level with positive QFT-GIT was significantly observed in model 2 (OR = 1.151, P < 0.05), model 3 (OR = 1.066, P < 0.05), and model 4 (OR = 1.223, P < 0.05). The correlation of MLR with QFT-GIT results was observed in model 3. These results indicated the significant relationship of MLR and sodium with QFT-GIT.

Table 4 Association of independent factors with QFT-GIT after adjusting variables.

Comprehensive model construction

Due to the importance of MLR and sodium on QFT-GIT results, we further established an integrated model incorporating MLR, sodium, and clinical characteristics by nomogram analysis.

Before nomogram analysis, we firstly explored the correlation of QFT-GIT results and clinical characteristics through multivariable logistic regression analysis (Table. 5), finding that gender, age, marital status, TST, and lived in household TB sick person were independently related to the QFT-GIT results (all P < 0.05).

Table 5 Correlation of clinical variables with QFT-GIT results.

Further, we integrated gender, age, marital status, TST, lived in household TB sick person, MLR, and sodium to construct a comprehensive nomogram model. The nomogram analysis (Fig. 2A) indicated that the comprehensive model can predict 0.8 probability for positive QFT-GIT, and MLR presented the largest contribution to the GFT-GIT in this model. The DCA analysis indicated that comprehensive model achieved more net benefit than either treat-all-positive or treat-none-positive strategies across all ranges of threshold probability (Fig. 2B). In addition, the ROC analyses (Fig. 3) both in training set (AUC = 0.791) and validation set (AUC = 0.762) showed that comprehensive model had favorable performance to predict positive QFT-GIT.

Figure 2
figure 2

Development of the comprehensive nomogram model and its performance on QFT-GIT results. (A) Nomogram was constructed with the laboratory signatures and clinical variables. (B) Decision curve analysis (DCA) was used to assess the clinical usefulness of the comprehensive model. The x-axis indicates the threshold probability. The y-axis indicates the net benefit.

Figure 3
figure 3

ROC analysis was used to assess the performance of the comprehensive model in training and validation sets.

Correlation between variables and mortality risk in individuals with QFT-GIT positive

We then explored the potential of comprehensive model to predict the disease development of individuals with positive QFT-GIT. Through ROC analysis, we also found a favorable performance of comprehensive model to predict death (Fig. 4A, AUC = 0.841). The DCA analysis indicated that comprehensive model achieved more net benefit across all ranges of threshold probability (Fig. 4B).

Figure 4
figure 4

Development of the comprehensive model and its performance on death risk in individuals with positive QFT-GIT. (A) ROC curve analysis was used to predict death. (B) Decision curve analysis (DCA) was used to assess the clinical usefulness. The x-axis indicates the threshold probability. The y-axis indicates the net benefit.

In addition, we further assessed the significant correlation of MLR and sodium themselves with death risk in individuals with positive QFT-GIT (Table. 6). The correlation of sodium with death was not observed in both crude model and adjusted models. However, the MLR showed significant correlation with death risk in crude model and other adjusted models (all P < 0.05). These results further confirmed the importance of MLR on the disease development among individuals with positive QFT-GIT.

Table 6 Association of independent factors with death using multivariate logistic regression.

Discussion

This study evaluated the correlation of 23 laboratory variables with positive QFT-GIT, finally finding that MLR (monocyte-to-lymphocyte ratio) and sodium were independent factors associated with QFT-GIT result. As the level of MLR increased, the risk of positive QFT-GIT decreased. Our study disclosed the importance of MLR in TB infection. Presently, more and more researchers have paid more attention to the MLR as a biomarker. For example, Cheng et al. found that MLR was significantly associated with an increased risk of depression19. Huang et al. found that MLR was associated with a 2-year relapse in patients with multiple sclerosis20. Kamiya et al. found that MLR can be a helpful diagnostic marker for lymphoma in adults with peripheral lymphadenopathy when the etiology is unclear21. The potential clinical value of MLR have been disclosed in various disease.

At present, previous studies have reported the significant role of MLR in TB-related disease. Chen et al. study showed that MLR was an independent factor in the diagnosis of spinal TB and was associated with the severity of spinal TB22. Choudhary et al. study found that MLR was related to TB and declined with anti-TB treatment in HIV-infected children23. Gatechompol et al. study indicated that increased MLR can predict the incident TB among people living with HIV on or after antiretroviral therapy, and at cut-point 0.23, the MLR provided a diagnostic AUC of 0.849 and a sensitivity of 85%, and specificity of 71%24. Sukson et al. study showed that MLR > 0.45 was the best cut-off point for diagnosing TB pleuritis where the sensitivity and specificity were 82.5% and 86.3%, respectively25. These researches have disclosed the significant clinical value of MLR in TB-related diseases, and its significance in TB needs more investigation.

We continuously explored the relationship between MLR and mortality risk in individuals with positive QFT-GIT, indicating that MLR was also an important prognosis predictor after adjusting potential confounders. As the level of MLR increased, the mortality risk of individuals increased. Circulating monocytes are first recruited to the infection sites and are induced to differentiate into M2 macrophages, acquiring the suppression function on adaptive immune response26. While lymphocytes are crucial for the adaptive immune response. The decrease of lymphocytes' absolute count may reflect an insufficient response of the host immune system to the disease, consequently enhancing the disease progression27. MLR has been referred to as an inflammatory and immune-suppressive index. Olivia et al. study showed that active TB individuals had a significantly higher level of MLR compared to both latent TB and no latent TB individuals28. It followed that MLR level was related to the severity of TB. In this study, a higher MLR value predicted the higher mortality risk of positive QFT-GIT samples. We speculated that positive QFT-GIT individuals might be constantly exposed to Mtb infection, which caused a constant inflammatory status in infection sites, thus deteriorating the TB progression. Hence, regulating the dysregulation of host immune response might contribute to the control of tuberculosis progression.

Finally, our study was subject to limitations. This study defined the tuberculosis infection as positive QFT-GIT, which may cause the problems of false positives even if the specificity of an IGRA were 95%. Dorman et al. found that most conversions among healthcare workers in low TB incidence settings appeared to be false positives, and these occurred 6 to 9 times more frequently with IGRAs than TST29. Mancuso et al. found that in low-prevalence populations, most discordance between different tests for latent tuberculosis infection, can be interpreted by false-positives30. In addition, our study was a cross-sectional design, and as such we were unable to determine the temporal relationship between factors and TB.

Conclusions

This study enrolled 5256 individuals into analysis, of which 521 had the positive QFT-GIT. Through LASSO and logistic regression analyses, sodium and MLR were identified to be independently associated with QFT-GIT result among 23 laboratory variables. After adjusting for potential confounders, the correlation between them was still observed. Based on MLR, sodium, and significant clinical characteristics, we constructed a comprehensive nomogram model, finding that comprehensive model had favorable performance for predicting QFT-GIT result and death risk of individuals with positive QFT-GIT. Further analysis showed that MLR rather than sodium was independently related to the death risk. Our study suggested that MLR might be an important factor in the initiation and progression of TB.