Diagnostic performance of the Abbott RealTime MTB assay for tuberculosis diagnosis in people living with HIV

Strengthening tuberculosis diagnosis is an international priority and the advocacy for multi-disease testing devices raises the possibility of improving laboratory efficiency. However, the advantages of centralized platforms might not translate into real improvements under operational conditions. This study aimed to evaluate the field use of the Abbott RealTime MTB (RT-MTB) and Xpert MTB/RIF assays, in a large cohort of HIV-positive and TB presumptive cases in Southern Mozambique. Over a 6-month period, 255 HIV-positive TB presumptive cases were consecutively recruited in the high TB/HIV burden district of Manhiça. The diagnostic performance of both assays was evaluated against two different reference standards: a microbiological gold standard (MGS) and a composite reference standard (CRS). Results from the primary analysis (MGS) showed improved sensitivity (Se) and reduced specificity (Sp) for the Abbott RT-MTB assay compared to the Xpert MTB/RIF (RT-MTB Se: 0.92 (95% CI: 0.75;0.99) vs Xpert Se: 0.73 (95% CI: 0.52;0.88) p value = 0.06; RT-MTB Sp: 0.80 (0.72;0.86) vs Xpert Sp: 0.96 (0.92;0.99) p value < 0.001). The lower specificity may be due to cross-reactivity with non-tuberculous mycobacteria (NTMs), the detection of non-viable MTBC, or the identification of true TB cases missed by the gold standard.


Materials and methods
Study design and setting. This is a prospective evaluation of the diagnostic performance of the Abbott RT-MTB and its RT-MTB RIF/INH assays. The study was conducted in the district of Manhiça, Maputo province, a rural area 80 km away from the capital with a population of approximately 190,000 inhabitants 27 . The HIV prevalence estimate in this district was 39.2% in 2012 28 and the latest published incidence rate of lab-confirmed TB among PLHIV (aged 18-47 years) is 847 per 100,000 population 29 .
Consecutive HIV-positive adults identified at the Manhiça District Hospital (HDM) were screened for any symptom compatible with TB, as recommended by WHO guidelines 30 between July to December 2016. Patients with at least one of the following symptoms: cough for any duration, hemoptysis, night sweats, fever or unintentional weight loss, were offered to participate in the study. Those who self-reported having received TB treatment within the previous 6 months were excluded. All samples were tested at the Centro de Investigação em Saúde de Manhiça (CISM) laboratory.

Study procedures.
Participants were enrolled at the clinic of the National Tuberculosis Programme (NTP) in Manhiça village after informed consent and data were collected through specific questionnaires. Blood samples were obtained for viral load testing using the automated RealTime HIV-1 Assay (Abbott Molecular). If participants had no recent CD4 results (documented within the last 3 months), TruCount blood tubes (Becton Dickinson Biosciences, San Jose, CA) were also collected to determine T-cells counts by flow cytometry.
As part of the TB diagnostic workup, participants provided two sputum samples, which were received the day following enrolment at the CISM laboratory. Sputum induction was performed for individuals that were unable to provide sputum spontaneously. Clinically diagnosed or laboratory-confirmed TB patients were managed according to the National Tuberculosis Programme (NTP) guidelines and were started on TB treatment. In case of discordant results between RT-MTB (positive) and the standard of care (culture and Xpert negative), patients were contacted for re-evaluation and were requested to provide a third sputum sample to aid the decision of treatment iniciation. All participants were scheduled for a follow-up visit 2 months after enrolment. Those who did not attend the clinic on day 60 were contacted by telephone and interviewed. Laboratory procedures. All diagnostic tests were performed in a TB Biosafety level 3 (BSL-3) laboratory, which is subject to external quality control and ISO certification.
Smear microscopy for both sputum specimens was performed by Ziehl Neelsen (ZN) staining. Results were reported as negative or on a scale of positive grades according to international standards. For each participant, specimens with the best quality were decontaminated using the Kubica method 31 and the resuspended pellet was used for all tests to compare diagnostic accuracy among homogeneous specimens (Fig. 1 Baseline characteristics of individuals were reported using mean and standard deviation, proportions, or median and interquartile range, depending on the variable type. The diagnostic performance of the RT-MTB and Xpert MTB/RIF was assessed through calculation of sensitivity (Se) and specificity (Sp), negative and positive predictive values (NPV and PPV) of the tests (with binomial distribution 95% confidence intervals).
Two different cohorts were evaluated: the per-protocol cohort (those who completed the follow-up visit at month 2) and the intention-to-treat cohort (all patients initially enrolled irrespective of having follow-up visit). For the per-protocol cohort, we conducted a primary analysis using aggregated results on solid and liquid MTBC culture as a gold standard (hereinafter microbiological gold standard, MGS). Patients were classified as "microbiologically confirmed" if either liquid or solid culture was positive. If both cultures were contaminated, they were excluded from the analysis. Non-tuberculous mycobacteria (NTM) growths were considered negative for MTBC. Patients in whom culture or NAAT assays results were not available (contamination or invalid results) were excluded from the analysis. For a secondary diagnostic test evaluation, a composite reference standard (CRS) was made by combining culture results and clinical information on treatment initiation (blinded to RT-MTB results). Lastly, in our "intention to treat" cohort we compared results to the microbiological reference standard.
The McNemar's test with continuity correction or the Exact nominal Symmetry Test when discordant cells had low counts, were used to evaluate a systematic difference between the performance of the Xpert and RT-MTB and against both reference standards.

Results
During the 6-month study period, 255 HIV-positive and TB presumptive individuals were enrolled following informed consent. Of these, only 227 patients provided sputum samples for TB investigation. Eleven participants were excluded due to the unavailability of test results and therefore, a total of 216 patients were included in the final analysis (Fig. 2).
The sociodemographic and clinical characteristics of all participants were assessed (    (Table 2).

Secondary analysis, (CRS).
The diagnostic test values of the per-protocol cohort using the CRS are provided in Table 4. From Table 2  www.nature.com/scientificreports/ ART antiretroviral therapy. 3 3 missing values for viral load results; 4 The limit of detection for the Abbott Realtime HIV-1 assay was 150 copies/ml.  Intention-to-treat cohort. The overall test performance of the intention-to-treat cohort was found to be similar to the per-protocol cohort of patients, as shown in Table 3. Overall, we did not find systematic differences in sensitivity between assays although Xpert´s specificity was higher in all cases (overall Xpert Sp  Table 3. Primary analysis. Diagnostic test values using aggregated culture results as the reference standard. Comparison between the per-protocol and the intention-to-treat cohort. *CI confidence interval; § PPV positive predictive value; # NPV negative predictive value.  Operational characteristics and challenges. Our laboratory follows rigorous quality control procedures and instrument track records were registered and detailed in specific laboratory logbooks. For a headto-head comparison of Xpert and RT-MTB assays, lab records were compared. The outcome indicated that the m2000 System required various interventions and repeated runs, as well as routine preventive maintenance. A list of technical interventions has been drawn up in Supplementary material (S3).

Discussion
This study aimed to evaluate the performance of the RT-MTB in a high TB and HIV burden region. To our knowledge, this is the largest study evaluating the RT-MTB diagnostic assay in clinical samples from a unique cohort of HIV-positive patients. We utilized the m2000 System for both HIV-1 viral load quantification and MTBC detection, with the purpose of adding evidence for the implementation of high-throughput and multidisease testing on a single device. Data on the clinical performance of the RT-MTB for the diagnosis of TB is limited and reported diagnostic values differ across studies (S1). In this clinical study, we compared both, RT-MTB and Xpert assays, to identify MTBC among PLHIV using two different reference standards. For the per-protocol cohort evaluation, the overall RT-MTB sensitivity on decontaminated samples was higher than Xpert. Our findings are in concordance with the study performed by Scott et al. 2017 22 . Although they obtained higher sensitivity values for Xpert when testing concentrated samples, RT-MTB identified a substantially higher number of MTBC cases among smear negative patients (74.4% versus 25.7%). Similar trends were seen in our study, although there was limited statistical power to detect differences when using the composite reference standard (48% versus 29% p value = 0.05). Analysis of the intention-to-treat cohort did not improve on test parameters, however, RT-MTB showed better sensitivity than Xpert. These results are in line with several studies showing that molecular case-detection diminishes among smear negative patients 5,33 .
On the other hand, Xpert achieved markedly increased specificity and higher PPVs in all analyses. RT-MTB specificity and PPVs values remained lower and constant throughout. Additionally, using the CRS, the specificity of Xpert reached 100% among smear negatives although this could be biased (incorporation bias) by the fact that clinicians were not blinded to Xpert results. In order to get a better understanding of the lower specificity of RT-MTB and if it would be translated into false-positive test results, we evaluated the 22 discordant results among RT-MTB and culture. Six of these cultures were positive for NTM. Cross-reactivity has not been reported previously in similar studies 19,24 , although NTM cultures are often excluded from the analyses or considered as contamination, biasing the study findings 17 . Conversely, in vitro evaluation of RT-MTB showed 97% specificity due to cross-reactivity with two samples containing NTM, although Cn values were close to the established cutoff (Cn = 40) 14 . Importantly, our setting is associated with high NTM isolation in pediatric patients investigated for TB 34 . Our results could therefore indicate some degree of cross-reactivity with NTM. Of the remaining 16 discordant samples, 2 were collected from previous TB patients, 2 were from patients that had been treated recently, 4 were from patients that died after the follow-up period and no other relevant information was found on the remaining 8 discrepant cases. When RT-MTB Cns were compared, the Cn mean among those potential false-positives approached the cutoff of 40 established by the manufacturer. This could support the hypothesis of the detection of either low amounts of DNA in recently treated patients, or real false-positive results. Highly sensitive molecular tests deal with the identification of non-viable DNA 35 or the detection of cross-contamination during test performance. In a recent meta-analysis to evaluate the laboratory cross-contamination of Mycobacterium tuberculosis, 2% of all positive results were found to be false-positives for this reason 36 . Our investigation did not suggest that intra-instrument carry-over was the cause of false positivity. Furthermore, our TB laboratory strictly follows Good Clinical and Laboratory Practice standards (GCLP); thus, cross-contamination of specimens due to material transfer during pre-analytic sample handling appears unlikely.
Since the evaluation of the molecular test performance relies on comparing results to a hypothetical error-free test, using culture as a gold standard brings limitations and possible detection bias 37,38 . Liquid and solid cultures have difficulties in identifying paucibacillary specimens, common in children and HIV patients. Whether positive molecular tests from culture negative samples are false-positives or misclassified real negatives are difficult to disentangle 22 . Therefore, we combined treatment initiation and positive cultures to better evaluate the accuracy of the molecular tests used in this study. The use of composite reference standards has been extensively used not only for tuberculosis [39][40][41] , but also for other infectious diseases when diagnosis accuracy evaluation might be compromised by weak reference standards 38 . Results on this approach led to improved specificity but lower sensitivity. Microbiological verification of TB cases is still challenging in the diagnostic workup and last year only 56% of cases were bacteriologically confirmed, therefore almost half of tuberculosis patients started treatment based on clinical observation 1 .
Lastly, on the importance of strengthening laboratory diagnosis in all dimension, the theoretical advantages of centralized platforms with greater capacity for testing might be translated into real improvements under operational conditions 42 . Although the Xpert´s PCR is performed in less than 2 hours 12 , just one cartridge can be tested per module, whereas up to 96 tests can be run simultaneously in the m2000sp platform. Nevertheless, these high-capacity devices, often result in longer turnaround time due to sample preparation and testing, thereby making them inferior in terms of throughput. In addition, they need suitable infrastructure, qualified personnel for the instrument´s set up and the capacity to perform adequate maintenance and handle any technical issues that may arise. The operational challenges (S3) we experienced with the m2000sp platform, raise awareness on the importance of strengthening diagnosis capacity, not only with regard to more accurate techniques but also on appropriate laboratory infrastructure, resources and trained personnel 7 .
Our study had further limitations. Firstly, we only tested the best quality specimen for liquid and solid culture because of budget concerns. Secondly, we could not assess the diagnostic accuracy of drug susceptibility testing due to a lack of phenotypic resistance strains. One of the advantages of the Abbott assay is its ability to identify rifampicin and isoniazid resistance mutations in the same DNA sample prepared to identify the presence of MTBC. However, we gained information on drug resistance in just 18 specimens with RT-MTB RIF/INH Resistance assay. Twenty-one samples were below the LoD, which is likely due to the lower LoD of the assay (17 CFU/ mL) compared to that of the Resistance assay (60 CFU/mL). For the remaining samples, the system reported other test errors. Additionally, a number of technical issues were encountered related to the m2000sp instrument, www.nature.com/scientificreports/ leading to repeats and delaying the study. Lastly, the percentage of lost-to-follow-up was higher than expected and we could not characterize false-positive samples any further through other highly sensitive molecular assays or techniques, such as sequencing, in order to assess the true specificity of RT-MTB assay.

Conclusion
In this study, conducted among PLHIV in southern Mozambique, our results suggest better sensitivity and confirm lower specificity for the Abbott RT-MTB assay compared to the Xpert MTB/RIF. The RT-MTB assay may detect cases that may not otherwise be detected by culture, although this added yield might also be associated with some degree of cross-reactivity with NTMs, detection of non-viable mycobacteria (previously treated patients) or cross-contamination. The considerable number of "false-positive" results calls for a profound case evaluation on an individual basis, involving trained personnel for the interpretation of molecular results and careful specimen handling to minimize the risk of potential cross-contaminations.

Data availability
The datasets generated during the current study are kept at the data center of CISM. An anonymized version of the dataset can be made available upon request to CISM's Internal Scientific Committee (Email: cci@manhica. net).