A Multinational Analysis of Mutations and Heterogeneity in PZase, RpsA, and PanD Associated with Pyrazinamide Resistance in M/XDR Mycobacterium tuberculosis

Pyrazinamide (PZA) is an important first-line drug in all existing and new tuberculosis (TB) treatment regimens. PZA-resistance in M. tuberculosis is increasing, especially among M/XDR cases. Noted issues with PZA Drug Susceptibility Testing (DST) have driven the search for alternative tests. This study provides a comprehensive assessment of PZA molecular diagnostics in M/XDR TB cases. A set of 296, mostly XDR, clinical M. tuberculosis isolates from four countries were subjected to DST for eight drugs, confirmatory Wayne’s assay, and whole-genome sequencing. Three genes implicated in PZA resistance, pncA, rpsA, and panD were investigated. Assuming all non-synonymous mutations cause resistance, we report 90% sensitivity and 65% specificity for a pncA-based molecular test. The addition of rpsA and panD potentially provides 2% increase in sensitivity. Molecular heterogeneity in pncA was associated with resistance and should be evaluated as a diagnostic tool. Mutations near the N-terminus and C-terminus of PZase were associated with East-Asian and Euro-American lineages, respectively. Finally, Euro-American isolates are most likely to have a wild-type PZase and escape molecular detection. Overall, the 8–10% resistance without markers may point to alternative mechanisms of resistance. Confirmatory mutagenesis may improve the disconcertingly low specificity but reduce sensitivity since not all mutations may cause resistance.


Results
Phenotypic testing. Out  Mutations in pncA of PZA R Isolates. Of the 224 PZA R isolates, 202 harbored a non-synonymous mutation in pncA and/or its promoter. Of these, 195 isolates (87%) harbored a mutation only in the gene, four (2%) only in its promoter, and 3 (1%) in both. The two PZA mono-resistant isolates belonged to the first group. Synonymous mutations in the coding region were ignored in our analyses. Those isolates that only harbored a synonymous mutation in a gene were labeled as having a WT protein in this study. Supplementary Table ST2 provides a comprehensive list of all mutations harbored by all isolates. Importantly, 22 PZA R isolates (10%) had a WT pncA gene and promoter. All 22 tested negative for PZase activity (Table 2 and Supplementary Table ST2). These 22 isolates were not part of the same clonal expansion (based on the lineage typing) (Supplementary Table ST2).
In 224 PZA R isolates, we observed 136 unique protein-altering polymorphisms in pncA and 6 unique mutations in its promoter. Of these, 40 polymorphisms in the gene and three (all indels) in the promoter had not been previously reported and are referred to here as novel mutations (Supplementary Table ST3) 14 .
The distribution of mutations across the gene is shown in Fig. 1  Mutations in pncA of PZA S isolates. Of the 72 susceptible isolates, 25 harbored a mutation in the coding region of pncA and/or in its promoter (Table 3 and Fig. 1b). Twenty-six (22 coding and 4 promoter) unique mutations in pncA of PZA S isolates were observed. Of the coding region mutations, 13 were novel, and of the promoter mutations, 2 were novel (Supplementary Table ST4). The most frequent non-synonymous polymorphism was Thr47Ala, occurring in three PZA S isolates, all of which tested positive for PZase activity (Fig. 1b). Figure SF1 depicts lineage-based stratification of mutations observed in pncA, both in resistant and susceptible isolates. The gene and its promoter were divided into seven "zones" and prevalence of mutations in isolates from each lineage was assessed. The results of this analysis are shown in Supplementary Table ST5. Most notable was the relatively high percentage (15%) of Euro-American PZA R isolates with WT pncA and promoter. Furthermore, "hot spots" were observed in specific lineages: codons 1-30 for East-Asian, 121-150 for Indo-Oceanic, and codons 151 and higher for Euro-American.

Lineage-based Analysis of pncA mutations. Supplementary
Similarly, there appears to be "cold spots" (31-60 for East-Asian and codons 91-120 for Euro-American), where very few isolates from these lineages harbored a mutation. Finally, the range between codons 91 and 120 seems to be a cold spot for all lineages, except for East-Asian. These patterns need to be confirmed in larger cohorts.

Mutations in rpsA.
Twenty PZA R and eight PZA S isolates harbored a mutation in RpsA (18 PZA R , seven PZA S ) or rpsA's promoter (2 PZA R , 1 PZA S ) region (Supplementary Table ST2). Of these 14 PZA R and 4 PZA S isolates only had heterogeneous mutations in the rpsA and its promoter. Of the 22 resistant isolates without a mutation in pncA or its promoter, three had a heterogeneous, non-synonymous mutation in rpsA (deletion of C in nucleotide 660, deletion of a C in nucleotide 1065 [novel], and deletion of a C in nucleotide 1142 [novel]). The three showed no PZase activity on Wayne's assay and had no mutations in panD. The most frequent rpsA variant was the previously reported synonymous change, Arg212Arg (99 PZA R and 20 PZA S isolates) 26 . This mutation has been identified as a Lineage 2 (East-Asian) marker. In our set, this mutation was harbored by three Euro-American isolates as well (Supplementary Table ST6) 29 .

Mutations in panD.
Previous studies have reported panD to be a potential target for PZA 25,30 . In this study, no monoclonal mutations were found in panD's promoter or coding region. Of the PZA R isolates without a mutation in pncA, one had a heterogeneous mutation in panD (-G291) (Supplementary Table ST2). This isolate showed no PZase activity.

Resistant Cases with WT promoter and coding regions in the three genes. Eighteen PZA R isolates
had no mutations in the promoter or the enzyme of the three genes considered in this study. All tested negative for enzymatic activity. Nine of the 18 belonged to Euro-American lineage, while eight were East-Asian (Beijing), and one was Indo-Oceanic.
Heterogeneous Populations in pncA. The Tables ST7 and ST2). In 13 of the 34 PZA R isolates, the heterogeneous variant was the only polymorphism in the three genes. Nine heterogeneous variants were observed in the 13 isolates (Supplementary  Table ST8). The most frequent of these variants was the novel insertion of C in nucleotide 453 which was observed in five resistant isolates. Three belonged to East-Asian, three belonged to Indo-Oceanic, two belonged to CAS, and one was Euro-American, ruling out the possibility of clonal expansion. In rpsA (coding and promoter region), 21 isolates (16 PZA R , 5 PZA S ) had heterogeneous calls, and in panD, 11 isolates (8 PZA R , 3 PZA S ) exhibited this  Table 3. Frequency of mutations in pncA and its promoter, panD and its promoter, and rpsA and its promoter in M. tuberculosis clinical isolates. WT = wild-type. *Includes heterogeneous variations (i.e. mixed populations with mutant and WT gene/promoter). † Also includes isolates that harbor synonymous mutations (only) in the gene.
behavior. No isolate had heterogeneity in more than one of the studied genes. The frequency of a heterogeneous observation in pncA of PZA R isolates was notably higher than that of PZA S isolates (15% resistant versus 8% susceptible). These frequencies were (7% versus 7%) for rpsA and (4% versus 4%) for panD (Supplementary  Table ST7).

Discussion
This multinational study is based on strains collected from pulmonary TB patients in four high burden countries. Our primary objective was to determine the accuracy of a molecular test to diagnose PZA resistance in our set of MDR-and XDR-TB patients from high TB burden regions. Three genes, pncA, rpsA, and panD, were considered in this study. Although each gene has been well-studied independently, a concurrent assessment of all three in MDR-and XDR-TB patients from multiple high-burden countries has been limited.
Correlation with resistance to other drugs. Among our isolates, broader resistance to other drugs directly translated to higher prevalence of PZA resistance. None of our isolates that were susceptible to the other seven drugs were PZA R , while 38% of those mono resistant to one of the other drugs, 69% of MDR isolates, and 86% of XDR isolates were PZA R (Supplementary Figure SF2). While it is known that PZA resistance is associated with MDR status 2 , such a direct relationship to the breadth of resistance to second line drugs, beyond MDR, is less established and should be investigated further in larger cohorts.
Phenotypic accuracy. Both false resistance and susceptibility have been noted for MGIT PZA DST in literature. The prevalence of false resistance, however, has been reported to be unusually high 11 . Over-inoculation is a common cause of false resistance and is a frequently the suspected reason for phenotypic-genotypic discordance 31,32 . Using orthogonal confirmation (see the Phenotyping section in Methods), we confirmed lack of enzymatic activity in all 22 phenotypic-genotypic discordant PZA R isolates (with WT pncA and promoter) ( Table 2). We were also able to establish the presence of enzymatic activity in 19 of the 22 phenotypic-genotypic discordant PZA S isolates (with a mutant pncA) (  36 . We report a specificity of pncA-based diagnostics in our study at 65% (25/72 PZA S isolates had a mutant pncA or promoter) which is notably lower than the global specificity of 90% 14 , but within the reported regional ranges 14,23,39 . It is known that not all changes affect the function of the enzyme significantly. The lower specificity in our study is due to the large number of novel pncA polymorphisms observed in our PZA S isolates. Since the phenotype has been confirmed by the enzymatic activity, as mentioned, we believe that these mutations either do not change the function of the enzyme or do so minimally causing low levels of resistance below MGIT cutoff. An important note about the sensitivity and specificity reported here is that these percentages depend on the criteria used for their calculation. Several such criteria have been proposed in the literature. The most common, and one that we have used, employs all pncA promoter or PZase variants as a marker for resistance. This provides the highest sensitivity but low specificity. Alternatively, all mutations observed in susceptible isolates could be excluded, providing 100% specificity but dramatically reducing sensitivity from 90% to 65% (146/224) in our study. Miotto et al. 23 propose a few different criteria, all of which would produce percentages between the two extremes calculated here. For instance, excluding mutations over 20 base pairs transcriptionally upstream of the gene would improve specificity from 65% to 70% but reduce sensitivity below 90%. The choice of this criterion will have important implications for molecular diagnostic platforms. Supplementary Table ST5. Perhaps the one bearing most significance is that the Euro-American lineage is the most likely to escape molecular diagnostics, as it has the highest percentage (15%) of resistant isolates without a mutation in pncA or its promoter. Furthermore, the hot spot regions identified could indicate regional convergent evolution associated with PZA resistance.

Lineage trends. Several trends can be noted in
Role of rpsA. The gene rpsA has held a hotly debated position within the literature, as there have been multiple publications both supporting and dismissing the gene's role in PZA resistance 24,26,40,41 While the trans-translational function of rpsA was shown to be inhibited by PZA (thought to explain PZA resistance in isolates without a pncA mutation) 24 , Alexander et al. were not able to find any phenotypically informative mutations in the gene 26 . Regardless, due to low prevalence, the predictive value of rpsA as an indicator for PZA resistance tends to be relatively low. In this study, only three heterogeneous frameshifts in rpsA (-C1065, -C1142, and -CA660) could potentially hold the molecular basis for otherwise unexplained resistant cases (Supplementary  Table ST2). All three were novel and their causal role in resistance needs to be confirmed. Assuming a causal role for all three mutations, the diagnostic sensitivity of RpsA was around 1% (3/224) in this study.

Role of panD.
Similarly, mutations in panD have been associated with PZA resistance in isolates with a WT pncA 25,30 . We only observed one such potential case. Dillon et al. postulated that media supplemented with pantothenate, certain pantothenate analogs, or other metabolites likely explained the PZA resistance of strains with mutations in panD 27 . The authors proposed that PZA resistance was independent of mutations in panD.
Causation. Generally, all changes in PZase are associated with PZA resistance 4,12 , yet this is not true for all mutations 14,23 . Some changes may still render a functional enzyme 42,43 , leading to low level resistance. Understanding the effects of these changes on the enzyme is crucial. Furthermore, the prevalence of unexplained resistance cases is currently likely underestimated since laboratory confirmation of the role of most variants in resistance is still lacking. This work could identify polymorphisms that do not play a role in resistance, or cause low levels of resistance well below the cutoff, adding to the number of unexplained resistance cases. As such, the causal role of all pncA, rpsA, and panD mutations in resistance should be experimentally confirmed in M. tuberculosis, similar to studies performed on inhA 44 , katG 45 , rpoB 46,47 , and gyrA 48, 49 . Alternative mechanisms. In this study, 18 PZA R isolates lacked any polymorphisms in the three genes.
Other studies have also reported such resistant cases 14,23 . A complementary or alternative mechanism of resistance, other than rpsA and panD, is most frequently suspected. A complementary mechanism regulating expression of PZase would explain the resistant cases with a WT gene. rpsA and panD do not adequately address this problem since both are considered targets of POA (activated form of PZA by PZase) 24,25,27 . Such a mechanism has proven to be elusive in spite of efforts in a number of laboratories around the world.
Heterogeneity. Chemotherapy in a host with mixed bacterial population selects for the resistant subpopulation 50 . Undetected heterogeneity could be an explanation for unexplained resistant cases. Using our WGS approach, we were able to detect low levels of heteroresistance, closer to the sensitivity of MGIT DST (10%) 51 . It is still possible that some unexplained resistance cases are due to existence of small resistance subpopulation detectable by DST but not by our WGS. Heterogeneity, as detected by our approach, seemed to be a random event in panD and rpsA with equal frequencies among resistant and sensitive isolates (Supplementary Table ST8). In pncA, however, heterogeneity had a notably higher frequency (nearly two-fold) among resistant as compared to susceptible isolates with a diagnostic specificity of 92% (Supplemental Table ST7). The association of heterogeneity with phenotypic resistance and its utility in diagnostics needs to be investigated in a larger cohort. In this study, we considered resistant cases with heterogeneous pncA mutations as explained cases.

Conclusion
A diagnostic approach, based on all pncA mutations, seems to be more appropriate than any selective criterion suggested in the literature as a diagnostic platform would err more on the false resistance side. While pncA as a whole demonstrated high association with PZA phenotype, rpsA and panD did not among our isolates and elsewhere 52 . The existence of 18 PZA R isolates lacking PZase activity with WT promoter and coding regions of the three genes may suggest a missing regulatory component in the currently understood mechanism of resistance. The high number of novel variations in PZase of PZA S isolates may suggest an undersampling of PZA-susceptible XDR-TB isolates in sequencing. For a comprehensive picture of the pncA genotype, this needs to be corrected. We also demonstrated that heterogeneity in pncA may not be a random event and that there are lineage-specific patterns among pncA mutations.
Overall, the results of this study demonstrate that a molecular diagnostic platform may suffer from a notable false resistance or false susceptibility error rate among MDR-and XDR-TB cases. In high TB burden countries this would introduce a non-negligible number of misdiagnosed cases.

Materials and Methods
Isolate Selection. M. tuberculosis strains were isolated from patient sputum in four countries (India, Moldova, Philippines, and South Africa). This effort was performed as part of a separate project called the Global Consortium for Drug-resistant tuberculosis Diagnosis (GCDD) 53 . Details of patient selection and sample collection methodology are described by Garfein et al. 28 and in the Supplementary Methods. All sequencing and phenotypic data was downloaded from the publically available repository on NCBI (BioProject: PRJNA353873).
MIRU-VNTR, Spoligotyping, and Lineage Determination. Genotyping using mycobacterial interspersed repetitive units variable number of tandem repeats (MIRU-VNTR) and spoligotyping were described by Garfein et al. 28 . Lineage determination based on MIRU and spoligo information was also described by Garfein et al. 28 . A brief summary is provided in the Supplementary Methods.
Phenotyping. All isolates were tested for phenotypic resistance to seven first-and second-line drugs, INH, RIF, three injectable antibiotics (kanamycin, amikacin, capreomycin), and two from the quinolone group of drugs (moxifloxacin and ofloxacin). DST results for these drugs have been previously published by Garfein et al. 28 . Standard BACTEC MGIT 960 methods were performed using WHO recommended critical concentrations.
PZA susceptibility testing was performed on BACTEC MGIT 960 for this study. Isolates with discrepant phenotypic and pncA genotypic results were further examined for PZase activity 54 . A brief description of both susceptibility testing and enzyme activity is located in the Supplemental Methods.
To test the validity of our phenotypic results we explored two common validation approaches: parallel and orthogonal testing. Parallel testing would require the DST to be repeated while orthogonal testing would require an independent method with an independent error profile as compared to that of MGIT DST, such as Wayne's enzymatic assay. Its error profile is the opposite of DST: higher false sensitivity rates but much lower false resistance rates 55 . Because this study aims to identify molecular markers that can be used for diagnosis of resistance, we chose the orthogonal approach for its lower likelihood of false resistance error. A wide range of rates for false resistance has been reported for PZA DST with some as high as 60% 32,56,57 . Recently this rate was estimated to be at 11.3% by Murray et al. 58 . The parallel approach with a repeat DST, therefore, would have ~1% false resistance rate (assuming a white noise, nonsystematic, random error event-otherwise higher) when both results agree. Wayne's assay has a 3% false resistance rate as most of its errors belong to the false sensitivity category 55 . As such, orthogonal testing has a false resistance rate of 0.3% (1/3 that of the parallel approach) when the results of both tests agree. Additionally, since it has not yet been established whether the false resistance rate of DST is a problem with the method or a characteristic of certain isolates (e.g. "flip-flopping" between multiple DST results), orthogonal testing allows the separation and investigation of the two potential causes.
Whole-Genome Sequencing. Sample and library preparation and post-sequencing analysis are described in the Supplementary Methods. Base calling was performed by consideration of reads supporting major and minor variants. Positions with a minor variant were labeled as "heterogeneous", otherwise, the position was considered "monoclonal". Minor variants were called using the criterion suggested by Black et al. 59 . For genotypic-phenotypic analysis we considered heterogeneous populations (in positions of consequence) as resistant since a mutant subpopulation was detected.
For this study, we considered the promoter (200 base pairs transcriptionally upstream from the annotated start site) and the coding regions of three genes: pncA (rv2043c), rpsA (rv1630), and panD (rv3601c). The genome positions for the six regions based on H37Rv reference (GenBank accession NC_000962.3) are listed in Supplementary Table ST9. These regions were examined for presence of genomic variation using the variant analysis methods described above and in the Supplementary Methods.