Genomic epidemiology of Mycobacterium tuberculosis in Santa Catarina, Southern Brazil

Mycobacterium tuberculosis (M.tb), the pathogen responsible for tuberculosis (TB) poses as the major cause of death among infectious diseases. The knowledge about the molecular diversity of M.tb enables the implementation of more effective surveillance and control measures and, nowadays, Whole Genome Sequencing (WGS) holds the potential to produce high-resolution epidemiological data in a high-throughput manner. Florianópolis, the state capital of Santa Catarina (SC) in south Brazil, shows a high TB incidence (46.0/100,000). Here we carried out a WGS-based evaluation of the M.tb strain diversity, drug-resistance and ongoing transmission in the capital metropolitan region. Resistance to isoniazid, rifampicin, streptomycin was identified respectively in 4.0% (n = 6), 2.0% (n = 3) and 1.3% (n = 2) of the 151 studied strains by WGS. Besides, resistance to pyrazinamide and ethambutol was detected in 0.7% (n = 1) and reistance to ethionamide and fluoroquinolone (FQ) in 1.3% (n = 2), while a single (0.7%) multidrug-resistant (MDR) strain was identified. SNP-based typing classified all isolates into M.tb Lineage 4, with high proportion of sublineages LAM (60.3%), T (16.4%) and Haarlem (7.9%). The average core-genome distance between isolates was 420.3 SNPs, with 43.7% of all isolates grouped across 22 genomic clusters thereby showing the presence of important ongoing TB transmission events. Most clusters were geographically distributed across the study setting which highlights the need for an urgent interruption of these large transmission chains. The data conveyed by this study shows the presence of important and uncontrolled TB transmission in the metropolitan area and provides precise data to support TB control measures in this region.

Tuberculosis (TB) remains a major public health threat and is currently the tenth leading cause of death worldwide and the leading cause of death by a single infectious microorganism. In 2018, 1.5 million people died from TB and 10 million new cases are estimated to have occurred worldwide. Brazil, along with the Russian Federation, India, China and South Africa-the BRICS countries-account for more than 40% of the global TB disease burden in incidence and deaths, and about 58% of the global burden of drug-resistant TB. In Brazil, 90,527 TB cases were reported (45 per 100,000 population) in 2018 1 . While the state of Santa Catarina in Southern Brazil is characterized by an intermediate TB incidence rate-23.7 per 100,000 population, the state capital Florianópolis shows a higher incidence rate of 46.0 per 100,000 population along with high rates of TB-HIV co-infection 2 .
Scientific RepoRtS | (2020) 10:12891 | https://doi.org/10.1038/s41598-020-69755-9 www.nature.com/scientificreports/ Worldwide, the asymmetrical distribution of the disease is also linked with distinctive lineages and genotypes of the Mycobacterium tuberculosis (M.tb) Complex species, the etiological agent of TB 3 . Early strain typing methods greatly boosted our understanding of TB transmission dynamics while simultaneously providing additional tools and perspectives on the phylogeographic landscape of such strains at a global level 3 . These so-called classical typing methods, i.e., those based on the characterization of genetic repetitive elements, have been successful in informing public health interventions due to the inability of traditional contact tracing approaches to reconstruct complex transmission chains [4][5][6] . In fact, population-based typing of M.tb clinical isolates has been important to infer on the degree of recent infection and highly relevant in the identification of specific risk factors and geospatial hotspots for intensified TB transmission. Two classical examples of such studies are those conducted in San Francisco and New York in the late 1980s and early 1990s that enabled the correlation of specific ethnicities, drug resistance and/or residence in specific areas with recent TB transmission as assessed by genetic clustering based on profiles obtained with Restriction Fragment Length Polymorphisms with IS6110 6,7 . In the next decades, classical genotyping methods such as Spoligotyping and Mycobacterial Interspersed Repetitive Unit-Variable Number of Tandem Repeat (MIRU-VNTR) have been widely used for molecular epidemiology studies 8 . While these methods have been an integral part of several national TB control plans, most of the genomic diversity and distinctiveness of each isolate is overlooked by typing methods that are only able to interrogate a minor fraction of an isolate's genome [9][10][11] . In contrast, Whole Genome Sequencing (WGS) has emerged as the leading typing strategy to study the dissemination and transmission dynamics at a genome-wide SNP-based resolution that clearly outperforms classical typing methods and, coupled with molecular evolutionary approaches, also holds the potential to retrace the evolutionary history of locally circulating strains 12,13 . This level of resolution has enabled the reconstruction of the microevolutionary trajectory of circulating extensively drug resistant strains in Portugal and South Africa along with the breakdown into genomic clusters that are proposed to reflect epidemiological links between patients [14][15][16] .
In Santa Catarina state, in Brazil, it is known that the M.tb population structure is mostly dominated by Latin American and Mediterranean (LAM) strains, followed by the "ill-defined" T strains as assessed by spoligotyping 17 . However, nothing is known regarding the genomic diversity associated with this region and reports using higher resolution typing methods are limited to 12-loci MIRU-VNTR 18 . To address this gap, herein we report the preliminary results from a genomic epidemiological study in the metropolitan area, aiming to unravel the M.tb genotypes circulating in this region, infer on existing and active transmission clusters in the community, and examine the distribution of drug resistance while exploring possible associations between unfavorable treatment outcomes and SNP-based lineages.

Results
Study sample and epidemiology. The Table S1). In addition, nine (6.0%) individuals were homeless and four (2.6%) spent time in a prison establishment.
Drug resistance and molecular basis. Among the 151 isolates obtained for the study, genotypic-based prediction for drug resistance enabled the identification of six (4.0%) isoniazid resistant isolates, three (2.0%) with rifampicin resistance, two (1.3%) streptomycin resistant isolates and, resistance to pyrazinamide and ethambutol was detected in one isolate (0.7%). Besides, resistance to second-line drugs were detected for ethionamide (n = 2; 1.3%) and for the FQ (n = 2; 1.3%). One MDR isolate was identified (Fig. 1).
The detected mutations underpinning these genotypic-based predictions are listed in Table 1 with the Ser315Thr at the katG gene being the most frequent mutation associated with isoniazid resistance (n = 4/6; 66.7%). Two additional inhA promoter mutations (inhA C-15T and T-8C) were putatively associated with isoniazid resistance and ethionamide resistance. Resistance to rifampicin was driven by two distinct rpoB mutations: Ser450Leu (n = 2; 1.3%) and Ser441Leu (n = 1; 0.7%). Two putative compensatory mutations were detected in the single MDR isolate found: ahpC − 48G>A and rpoC Phe452Ser 19 . The complete resistance-related mutation profile of all isolates is shown in the Supplementary spreadsheet. Genomic population structure and phylogeny. All strains had been previously characterized by classical spoligotyping by membrane reverse-hybridization methods, revealing a population structure composed mainly of LAM strains (n = 91; 60.3%), followed by the ill-defined T strains (n = 25; 16.4%) and Haarlem (n = 12; 7.9%). In total, 38 different SITs were found; the SIT 216/LAM5 (13.2%) was the most prevalent, followed by SIT 42/LAM9 (7.3%), SIT 17/LAM2 (7.3%) and SIT 64/LAM6 (7.3%) (Fig. 1). As all clinical isolates were subsequently subjected to WGS, and consistent with the spoligotyping-based classification, all 151 isolates were classified as M.tb Lineage 4 (Euro-American) across 13 of its sub-lineages with sub-lineages 4 .2%) being the most frequent. Moreover, screening for the fbpC 103 SNP marker for LAM confirmed the spoligotyping-based classification of all LAM strains while three isolates initially classified as belonging to the T family were found to bare fbpC 103 and, from a phylogenetic standpoint, should therefore be considered as LAM strains (SC61, SC115, SC134 [SIT823/T1]). Eleven additional isolates without clade assignment or of unknown profile on SITVIT2 also carry this marker (ten Orphans and one SIT2752) (Fig. 1) www.nature.com/scientificreports/ www.nature.com/scientificreports/ a deeply rooted monophyletic clade. All drug resistant isolates detected were scattered across the phylogenetic tree suggesting independent emergence of drug resistance associated mutations.

Snp-based clustering and M.tb transmission.
The average distance amid the core-genome of the 151 isolates was 420.3 SNPs. To delineate genomic clusters that might reflect recent transmission underpinned by putative epidemiologically linked patients, isolates within a maximum distance of 5 SNPs were assigned to genomic clusters (GC). Using this cutoff criterion, a total of 66 (43.7%) isolates were grouped into 22 clusters. The largest cluster (GC1) involved 12 isolates including one isolate obtained from a patient that spent time in a prison. The second largest cluster (GC3) involved six patients while the third largest clusters (GC14 and GC4) included four isolates each; GC14 with one isolate from an individual with incarceration history and GC4 including two epidemiologically-linked patients that shared the same household (pairwise distance: 0 SNPs). The remaining 18 genomic clusters included 42 isolates: one cluster included one isolate from an individual that spent time in a prison establishment; another cluster included a homeless patient; and, one other cluster was composed of two household contacts (pairwise distance: 0 SNPs; Supplementary Table S2). All clusters except GC1, GC8 and GC22 formed monophyletic branches in the phylogenetic tree suggesting further circulation and differentiation of strains within the same branch. The GC2 branch stemmed from within the GC1 clade. GC5 (n = 2) and GC22 (n = 2) isolate appear to have emerged from a broader late-branching phylogenetic SIT73/T (sub-lineage 4.1.2) clade that may represent a group of a mostly non-clustered strain that underwent extensive circulation in the community. A minimum spanning tree (MST) obtained for all isolates confirmed the integrity of all clusters highlighting a more extensive dissemination of GC1 isolates along with the observation of independently generated drug resistance associated nodes in the tree (Fig. 2, Supplementary Figure S1). In the present sampling, the geographical distribution of patients across metropolitan region by residence location showed a higher concentration of cases among patients living at the central region of Florianópolis. Nevertheless, genome clustered cases did not show any particular pattern of geographical clustering as patients whose isolates were clustered showed a wide geographical dispersion in the studied region (Fig. 3). Comparing the pairwise geographic distances between patients' residence ( Fig. 3, Supplementary Figure S2) no statistically significant difference was observed between non-clustered patients (average pairwise distance: 13.7 km; range 0-45 km) and patients within the seven largest clusters (n ≥ 3 patients) except for: patients in GC2 (average pairwise distance: 3.7 km; range 1-5 km) which showed a statistically lower pairwise distance (p = 0.0302, Wilcoxon Rank Sum Test); patients in GC4 (average pairwise distance: 2.2 km; range 0-5 km) which also showed a statistically lower average pairwise geographic distance (p = 0.0004, Wilcoxon Rank Sum Test); and, likewise for patients in GC10 (average pairwise distance: 2.2 km; range 0-5 km; p = 0.0004, Wilcoxon Rank Sum Test). The geographical distribution of the analysed cases is similar to the 263 cases reported for Greater Florianópolis in the same period (Supplementary Figure S3). treatment outcome and statistical association. According to the data collected, 107 (70.9%) patients were declared cured, treatment failure occurred in four (2.6%) patients and six (4.0%) patients died. Additionally, treatment dropout was observed for 33 (21.9%) patients and no information on the treatment outcome was available for one (0.7%) patient (Table1).
We found a statistically significant association between TB outcome and HIV coinfection (p = 0.022, Fisher's exact Test) since the coinfected patients had a statistically lower cure rate (75% vs 94.5%, respectively) and a higher death rate (18.7% vs 2.2%, respectively). No statistical association between TB outcome and the others risk factors (alcohol intake, illicit drugs usage and clustering) was found (p ≥ 0.05; Table 2).
Regarding treatment dropout and risk factors, we found that: higher treatment dropout rates were statistically associated with excessive alcohol intake (p = 0.009, Chi-square Test) and HIV coinfection (p = 0.009, Chi-square Test); HIV coinfection was associated with illicit drug usage (p ≤ 0.001, Chi-square Test), alcohol www.nature.com/scientificreports/ intake (p ≤ 0.001, Chi-square test) and lower rates of hemoptysis (p = 0.011, Chi-square Test; Table 2). Illicit drug usage and excessive alcohol intake were also found to be associated (p ≤ 0.001, Chi-square Test). No statistically significant association was found between clinical characteristics and SNP-based M.tb lineages. We tested also the hypothesis of a possible association between the outcomes and/or clinical characteristics and clustering, however, no association was found. Likewise, no significant association was found between risk factors and clustering (data not showed).  17,18,20 . The present study comprises the first genome-wide study on M.tb in Santa Catarina state. Greater Florianópolis has a higher TB incidence when compared to the overall incidence rate on the state level, an intermediate prevalence setting 21 . In this study sample we found a high rate of TB-HIV co-infection (19.71%) that is consistent with the rate expected for this region (22.6%) and is well above the national rate (9.4%) 22 . Co-infected patients were associated with a low cure rate and excessive consumption of alcohol and drugs. HIV-coinfection is a widely known risk factor for TB development since the risk for TB development is 19 times higher in the population living with HIV when compared with the rest of the population, leading to poorer outcomes and lower relapsefree cure rates 1,23 . Nonetheless, alcohol abuse or unhealthy alcohol usage is an increasingly acknowledged risk factor known to affect the outcome of TB treatment and a risk factor for TB treatment adherence 24,25 . Recent data obtained in Uganda and Kenya further demonstrate that alcohol use among HIV-infected individuals appears www.nature.com/scientificreports/ to be associated with decreased viral suppression due to the lower diagnosis rate and lower likelihood of being on an anti-retroviral treatment regimen if already HIV diagnosed 26 . In this regard, an integrated approach to reduce unhealthy alcohol consumption or illicit drug usage may lead to better outcomes 27 . WGS data enabled the screening of drug-resistance conferring mutations on a genome-wide scale which allowed the identification of 10 (6.6%) isolates resistant to at least one anti-TB drug and one (0.7%) MDR-TB isolate. Despite the low MDR-TB rate in this study, two clinical isolates from different patients are predicted to be mono-resistant to the FQs due to two distinct high-confidence gyrA mutations for FQs resistance prediction 16,28 . FQ mono-resistance or among non-MDR isolates has been reported across multiple settings with varying ranges 29 . Recently Kim et al. 30 reported a 0.8% FQ resistance rate among non-MDR strains detected across multiple hospitals in South Korea while revealing an increasing trend over the last two decades. TB patients with prior FQ prescription to TB diagnosis, usually to treat community-acquired pneumonia, have a three-fold higher risk of having FQ-resistant TB 31 . Multiple FQ prescriptions, FQ prescription more than 60 days prior for TB diagnosis and for more than 10 days are associated with FQ-resistant TB 31,32 . This study shows a 1.3% FQ-resistance rate among non-MDR-TB patients for this setting driven by two independent mutational events. One limitation to the interpretation of the data lies in the fact that no data regarding previous FQ exposure was obtained for these two patients and, as such, we cannot exclude these from being primary FQ mono-resistant TB cases 33 . Nonetheless, the detection of these two FQ mono-resistant isolates warns against a possible excessive usage of FQs in the community and calls upon additional antimicrobial stewardship measures since to our knowledge these comprise the first two FQ mono-resistance cases to be reported in Brazil.
Regarding M.tb diversity, SNP-based typing classified all the 151 isolates as Euro-American lineage 4 strains. The Euro-American lineage predominance in Santa Catarina State and in the Southern Brazil 17,34 , occurred due to migratory processes from Europe to South America that increased in the seventeenth century 35 . Therefore, LAM, T and Haarlem, the most common spoligotyping families identified in this study, were also found in other studies in Southern Brazil 17,20,34,36,37 . Our examination of the distribution of the fbpC 103 SNP, considered as a highly specific marker for the LAM family not only confirmed the identification of all spoligotyping-based identified LAM, but increased the frequency of this lineage to 69.5% (105/151). The three SIT823/T1 isolates, assigned to the "ill-defined" T family according to SITVIT2 are in fact phylogenetically positioned as LAM in the phylogenetic tree.
In accordance with previous studies 38, 39 , using a conservative five-SNP threshold to define genomic transmission clusters enabled the clustering of 43.7% of the isolates included in the study thereby demonstrating that approximately half of the TB cases in metropolitan region between 2014 and 2016 were due to recent transmission. One recent epidemiological transmission study conducted in the Pará State, in North Brazil, across household contacts did find two cases in the same household whose isolates are described as being 9 SNPs apart 40 . In this latter study, the route of transmission is unclear since the temporal distance between these isolates was 7 years and therefore questions if the transmission between these cases was direct and, if missing links are likely to exist. Herein, we opted to use a 5 SNP distance cutoff as to obtain high-confidence genomic clusters that are driven by recent transmission. Also, no association was found between clustering and risk factors, treatment dropout and clinical characteristics when using 12 or 25 SNPs cutoff distances to define clusters (data not shown).
The recent transmission scenario herein obtained lends support to a continued TB transmission, most likely still ongoing. A limitation to the study relies in the fact that only 151 (57.4%) cases were analyzed from a total of 263 reported cases for the same period in Greater Florianópolis and missing links in transmission chains are therefore likely to exist. However, the data conveyed has already enabled the detection of one large transmission cluster (GC1) that is responsible for 7.9% of the cases analyzed. All the isolates grouped into GC1 belonged to M.tb sublineage 4.3.3 and to SIT216/LAM5. Interestingly, SIT216/LAM5 has been described in Santa Catarina www.nature.com/scientificreports/ state as the most prevalent SIT in prison establishments but only one isolate in GC1 had spent time in prison 18 . These facts highlight the epidemiological importance of SIT216/LAM5 clones, now at the community level, and its association with recent transmission. LAM5 spoligotype is also the most reported in Rio Grande do Sul, another state in Southern Brazil, however belonging to SIT93 34 . A parallel situation has been reported with SIT863 MDR-TB strains in Rio Grande do Sul, that was initially identified in prison establishments but is now responsible for the majority of MDR-TB cases in that state 37,41 . The presence of isolates from inmate individuals in the TB transmission chains supports the dissemination of strains between the general population and prison establishments and although the directionality is unclear at this point, the latter along with the entire prison system may act as reservoir of specific strains and promote a wider dispersion of specific clones 42 . Herein, the GC1 cluster showed a widespread geographical distribution when patient residence is considered (Supplementary Figure S2). Additionally, the low core-genome SNP distances among the isolates in these largest genomic clusters, may indicate that transmission has occurred in a very recent timeframe and is likely ongoing, suggesting that in order to prevent further spread of these strains, closer surveillance of these phylogenetically distinct clades is highly important.
Here we conducted the first WGS-based M.tb diversity study in SC state. Our study has some limitation, among them, for several samples phenotypic DST were not available, thus, we used WGS data to drug resistance prediction. Besides that, the present study only included new TB cases that started a first-line anti-TB treatment. Although the sampling in the present study occurred from 2014 to 2016, more recent data show an increase in the incidence TB rate (43.5% in 2016-46.0% in 2018) 2,22 and it would be interesting to evaluate if this is due to strains belonging to the main transmission chains herein identified. This fact reinforces the importance of molecular epidemiology as an instrument for TB surveillance and supporting public health measures.

conclusions
The present study stresses the importance of WGS-based approaches that enable high-resolution phylogenetic analysis to investigate M.tb transmission in Brazil and when compared with classical genotyping techniques that may lead to overestimated clustering rates 43 . In this study we undertook a WGS-based analysis of the genetic diversity and recent M.tb transmission in a Southern Brazil region improving the knowledge about TB dynamics in this setting and filling out the lack of genomic data on M.tb strains circulating in the state of Santa Catarina. The data shows that uncontrolled TB transmission in the metropolitan region of Florianópolis occurred and provides precise data to support TB control measures in this region. The estimated population in the both cities is 700,000 inhabitants 44 and the TB incidence rate approximately 46 cases per 100,000 population 21 . During the study period, a total of 263 new pulmonary TB cases with positive culture were notified, of these, 218 were able to contact and accepted to participate in the study; 151 had DNA available for sequencing, representing 57.4% of the total (Supplementary Table S3). Mycobacterium tuberculosis was isolated from sputum samples obtained from patients before starting anti-TB treatment. The treatment outcomes were assessed upon treatment completion and defined as cure, treatment failure (persistence of smear microscopy and/or culture positives after the treatment), treatment dropout or death.

Materials and methods
Smear microscopy and culture. The 56 and implemented in TB-Profiler. The fbpC 103 polymorphism (G->A at codon 103) was used to differentiate LAM strains from non-LAM strains 57 . Phylogenetic analysis was performed using Snippy pipeline v4.3.6 (https ://githu b.com/tseem ann/snipp y) for variant calling and alignment of all core SNP variants. SNP positions within PE/PPE genes or other repetitive regions associated with low mappability scores were removed from the final core-genome alignment, which was composed of 17,027 positions. A maximum-likelihood phylogenetic tree was generated using the PhyML, applying the generalized time reversible (GTR) model and branch support assessed by the approximate Likelihood Ratio Test (aLRT) as implemented in Seaview 58 . The resulting tree was rooted using M. canettii (Genbank accession number: NC_019950.1). A minimum spanning tree was generated using Phyloviz (v2.0) and the therein implemented goeBURST algorithm 59 . We used a 5, 12 and 25 SNPs cut-off to delineate genomic clusters 60 among the core SNP alignment using R along with the ape package and the hclust function. The patient address was plotted in a map using the online tool Microreact (www.micro react .org). Pairwise geographical distances between patients were assessed in R using the Imap package.