Genotypic clustering does not imply recent tuberculosis transmission in a high prevalence setting: A genomic epidemiology study in Lima, Peru

Background Whole genome sequencing (WGS) can elucidate Mycobacterium tuberculosis (Mtb) transmission patterns but more data is needed to guide its use in high-burden settings. In a household-based transmissibility study of 4,000 TB patients in Lima, Peru, we identified a large MIRU-VNTR Mtb cluster with a range of resistance phenotypes and studied host and bacterial factors contributing to its spread. Methods WGS was performed on 61 of 148 isolates in the cluster. We compared transmission link inference using epidemiological or genomic data with and without the inclusion of controversial variants, and estimated the dates of emergence of the cluster and antimicrobial drug resistance acquisition events by generating a time-calibrated phylogeny. We validated our findings in genomic data from an outbreak of 325 TB cases in London. Using a larger set of 12,032 public Mtb genomes, we determined bacterial factors characterizing this cluster and under positive selection in other Mtb lineages. Findings Four isolates were distantly related and the remaining 57 isolates diverged ca. 1968 (95% HPD: 1945-1985). Isoniazid resistance arose once, whereas rifampicin resistance emerged subsequently at least three times. Amplification of other drug resistance occurred as recently as within the last year of sampling. High quality PE/PPE variants and indels added information for transmission inference. We identified five cluster-defining SNPs, including esxV S23L to be potentially contributing to transmissibility. Interpretation Clusters defined by MIRU-VNTR typing, could be circulating for decades in a high-burden setting. WGS allows for an improved understanding of transmission, as well as bacterial resistance and fitness factors. Funding The study was funded by the National Institutes of Health (Peru Epi study U19-AI076217 and K01-ES026835 to MRF). The funding sources had no role in any aspect of the study, manuscript or decision to submit it for publication. Research in context Evidence before this study Use of whole genome sequencing (WGS) to study tuberculosis (TB) transmission has proven to have higher resolution that traditional typing methods in low-burden settings. The implications of its use in high-burden settings are not well understood. Added value of this study Using WGS, we found that TB clusters defined by traditional typing methods may be circulating for several decades. Genomic regions typically excluded from WGS analysis contain large amount of genetic variation that may affect interpretation of transmission events. We also identified five bacterial mutations that may contribute to transmission fitness. Implications of all the available evidence Added value of WGS for understanding TB transmission may be even higher in high-burden vs. low-burden settings. Methods integrating variants found in polymorphic sites and insertions and deletions are likely to have higher resolution. Several host and bacterial factors may be responsible for higher transmissibility that can be targets of intervention to interrupt TB transmission in communities.


5
VNTR cluster spanning pan-susceptible to MDR-TB isolates that was identified in 4,000 TB 6 patients enrolled in a household transmissibility study. We examine both host and TB genotypic 7 data to understand the evolution of isolates within this cluster, infer the timing of emergence of 8 antibiotic resistance, and identify genetic bacterial factors unique to this cluster that may have 9 contributed to its success. and consent have been previously described. 33,34 Briefly, patients were enrolled if they were 1 6 diagnosed with pulmonary TB (PTB) at public health clinics and were followed through therapy.

7
Their household contacts were also followed with tuberculin skin testing and monitored for

6
Please refer to the supplement for bacterial culture, drug susceptibility testing (DST), DNA 1 sequencing methodology and phylogenetic analysis.  (Table 1).

7
About one-half were smear-positive (51.4%) and one-third used alcohol (35.8%) or other 8 intoxicating substances (27.9%). Two patients had isolates that did not meet our sequencing 9 quality criteria and were excluded. Of those patients with high quality sequence data (n=61) a 1 0 higher majority were male with MDR-TB but were otherwise comparable to the superset of 148 1 1 (Table 1). Sequencing data revealed that 58 isolates belonged to the Latin America-  The geographic distribution of strains based on household coordinates, colored by resistance 2 0 pattern is shown in Figure 1. Comparison between genetic and geographic distance did not 2 1 support that the cluster spread in a single geographic direction, even when three most distant  other). A pair of isolates collected two months apart from a host who had not been on treatment 2 0 (MDR strains M06 and M10) was found to have no SNP differences outside of PE/PPE regions.

3 4
Looking at genetic evidence alone for recent transmission using a distance cutoff of

6
Other than the pair (index parent M23 and contact child M12), none of these belonged to the 1 same household. With the addition of high confidence indels and PE/PPE SNPs and using the 2 cutoff of ≤ 7 variants (distance between the two isolates belonging to the same patient when high 3 confidence indels and PE/PPE variants were included), there were 41 links among 30 patients, 4 i.e. 76% fewer links than when these variants were excluded. Phylogenetic trees built by      Table 2). In addition to 2 2 identifying the known mutations that confer resistance to isoniazid and rifampicin, we found an  prevalence settings have noted shorter genetic distance between isolates, 2,3,13 and in one case 2 3 the distances were insufficient to reliably and consistently inform contact tracing interventions. 13

4
It is possible, that certain features of our selected cluster have led to the observation of such 2 5 high levels of diversity. First our cluster spans the spectrum of pan-susceptible to resistant to 7 2 6 drugs, second is that our isolates all belong to lineage 4, a lineage that has been noted to be the 1 most phenotypically and genotypically diverse of the TB lineages. 39 However, the proportion of 2 diversity that could be linked directly to drug resistance was low. A parsimonious explanation of 3 the high degree of observed genomic diversity is that the rate of MIRU-VNTR pattern evolution 4 is on average slow and on the order of decades. Despite this, MIRU-VNTR likely offers sufficient 5 resolution in low prevalence settings as most TB cases there tend to be imported. indels did not show significant differences, there were notable differences within the closely 2 2 related cluster highlighting that similarity measures that rely on SNPs alone could be 2 3 misleading. Inclusion of indels and PE/PPE regions in estimation of divergence dates is limited 2 4 by our current lack of knowledge regarding their evolutionary rates but these regions account for 2 5 an appreciable proportion of variation seen between closely related isolates and thus including 1 this information may affect interpretation of transmission links. 43

3
We identified many genomic links using the SNP distance threshold of ≤ 5 criterion 3 that were 4 not discovered within household contact investigation, providing evidence that household 5 contact investigation is not sufficient to identify and treat secondary TB cases as transmission 6 can occur anywhere in the community. Additionally, 2 of 3 case pairs that belonged to the same 7 household were found to have large genetic distances making it more likely that transmission 8 occurred outside the household. Although the dataset used was relatively small, these findings Our phylogenetic dating procedures supports that acquisition of MDR is not recent in Lima, and 2 0 that MDR cases, given the observed phylogenic structure, are mostly related to transmission.

1
This finding is consistent with other studies performed in other countries including South The cluster under study was the largest such cluster observed in the Lima household 1 2 transmission study. Its transmission success was likely due to both bacterial and host factors.

3
We quantified the host predilection to transmit TB with the PTP measure and found the cluster 1 4 to have a higher score than the median PTP measure reported by a study in Netherlands. 35  have led to an underestimation of the diversity. However, the sampled subset demonstrated a 1 substantial amount of diversity, more than would be expected within a cluster with identical 2 MIRU pattern. 2, 3 We also cannot exclude that the 2 outer most isolates were miss-assigned the 3 reported MIRU pattern and because of this we focused on the isolates confirmed to be of the 4 same lineage by in silico spoligotyping and the WGS SNP barcode. Finally, it is important to 5 note that our dating estimates are heavily reliant on the molecular clock rate that has been 6 previously reported in the literature.