Whole genome sequencing of multidrug-resistant Mycobacterium tuberculosis isolates collected in the Czech Republic, 2005–2020

The emergence and spread of resistant tuberculosis (TB) pose a threat to public health, so it is necessary to diagnose the drug-resistant forms in a clinically short time frame and closely monitor their transmission. In this study, we carried out a first whole genome sequencing (WGS)-based analysis of multidrug resistant (MDR) M. tuberculosis strains to explore the phylogenetic lineages diversity, drug resistance mechanisms, and ongoing transmission chains within the country. In total, 65 isolates phenotypically resistant to at least rifampicin and isoniazid collected in the Czech Republic in 2005–2020 were enrolled for further analysis. The agreement of the results obtained by WGS with phenotypic drug susceptibility testing (pDST) in the determination of resistance to isoniazid, rifampicin, pyrazinamide, streptomycin, second-line injectables and fluoroquinolones was more than 80%. Phylogenetic analysis of WGS data revealed that the majority of MDR M. tuberculosis isolates were the Beijing lineage 2.2.1 (n = 46/65; 70.8%), while the remaining strains belonged to Euro-American lineage. Cluster analysis with a predefined cut-off distance of less than 12 single nucleotide polymorphisms between isolates showed 19 isolates in 6 clusters (clustering rate 29.2%), located mainly in the region of the capital city of Prague. This study highlights the utility of WGS as a high-resolution approach in the diagnosis, characterization of resistance patterns, and molecular-epidemiological analysis of resistant TB in the country.

www.nature.com/scientificreports/ From 2005 to 2020, the incidence of TB as well as the related deaths decreased significantly in the Czech Republic, so it is currently considered a country with a low incidence of TB (Fig. 1).
Phenotypic drug susceptibility testing (pDST) is routinely performed on all TB culture-positive specimens in the country. The results from 2005 to 2020 showed a total of 480 strains of M. tuberculosis resistant to at least one antituberculosis drug. More specifically, in 2005-2020, the estimated proportion of MDR and XDR strains was 2.13% of all phenotypically tested isolates (of which 17.4% in previously treated patients) 3 .
Culture-based pDST on solid or liquid (Mycobacterium growth indicator tubes-MGIT) media and genotypic drug susceptibility testing (gDST) methods (such as Xpert MTB/RIF, Cepheid, Sunnyvale, California, USA and GenoType MTBDR, Hain LifeScience GmbH, Nehren, Germany) are currently preferred for the determination of M. tuberculosis resistance in clinical settings 4 . Limitations of pDST and gDST lead to the development and application of advanced molecular techniques such as whole genome sequencing (WGS). WGS data provide a rapid and detailed insight of mutations encoding resistance to all available antituberculosis drugs and contribute to the identification of the transmission patterns based on the genomic relatedness of M. tuberculosis strains, without the need for traditional contact tracing [5][6][7][8] . The molecular-epidemiological analysis of TB in the Czech Republic was conducted by so-called classical typing methods 9 . However, these methods interrogate only a small fraction of the M. tuberculosis genome to reconstruct the complex transmission chains and do not show sufficient discriminatory power to track the transmission links between patients 10 .
Despite the fact that a large proportion (more than 28%) of patients diagnosed with TB are born outside the Czech Republic, the genetic relatedness of drug-resistant M. tuberculosis strains has not yet been studied in detail 11 . The aim of this work was to perform the first WGS-based study using the MDR-, XDR-isolates collected in the Czech Republic during the years 2005-2020. Sequencing analysis determined the complete resistance profile and revealed the phylogenetic diversity of the strains, thus contributing to the characterization of local outbreaks and the monitoring of the dynamics of resistant strains over the years. This work will serve as a baseline to analyse the molecular epidemiology and drug-resistance of M. tuberculosis strains in the Czech Republic.

Results
Sample collection. A total of 65 samples of MDR M. tuberculosis strains were included, representing 60% of the total number of MDR-TB cases diagnosed during the study. A limited number of samples were due to the inability to reactivate some cultures in MGIT medium. Samples have been collected continuously over the years, specifically: 3 in 2005, 1 in 2006, 3 in 2008, 7 in 2009, 2 in 2010, 4 in 2014, 9 in 2015, 4 in 2016, 9 in 2017, 10 in 2018, 12 in 2019 and 1 in 2020. Subsequent isolates from the same TB patient were included in the study (only in two cases: CZ800-17/CZ541-18 and CZ628-17/CZ149-18) as long as the isolation date is more than six months after the isolation date of the previous isolate. The study included 14 women and 47 men with a mean age of 40 years (ranging from 23 to 81 years).
In overall, 10 isolates were determined as XDR.  (Fig. 2). Cluster analysis with a predefined maximum cut-off of 12 SNPs difference revealed 6 clusters containing 15 MDR isolates and 4 XDR isolates (n = 19; clustering rate 29.23%) with a maximum size of 2-5 isolates for each cluster (Fig. 2). Nine isolates were obtained from patients with newly diagnosed TB and 10 from patients with confirmed recurrent TB. These clusters consisted mainly of strains from Beijing sublineage 2.2.1 (73.68%) and to a lesser extent from Euro-American sublineage 4.1.2.1 (Haarlem) (26.32%). A difference of 0-2 SNPs between patients was determined in 8 cases (Fig. 2, CL0, CL2), indicating a recent transmission events and the difference 3-5 SNPs was documented in only 2 patients, indicating the direct transmission (Fig. 2, CL5). Using a threshold 6-12 SNPs to look at older transmission events, 9 patients in 4 clusters were identified (Fig. 2, CL12). The distance between the individual clusters ranged from 35 SNPs (between cluster 5 and 6) to 1066 SNPs (between cluster 3 and 1) (Fig. 3).

Phylogenetic analysis and identification of transmission cluster.
Four isolates (80%) from cluster 1 come from the capital city of Prague. Patient CZ8842 is from another region (Nový Jičín) but was first diagnosed with MDR-TB in 2005 (as well as patients CZ188 and CZ218), suggesting an association with other isolates from this cluster. Epidemiological data on patients grouped in clusters 2 and 3 also indicated the spread of a resistant strain of M. tuberculosis from Prague region. Cluster 5 was formed by three isolates, two of which related to the same accommodation facility in the town of Šenov. The remaining clusters consisted of only two isolates that came from geographically separated areas (Supplementary Table 1).

Discussion
WGS is a molecular-genetic method that is becoming increasingly preferred in clinical laboratories; however, it has not yet been performed on continuously collected MDR and XDR isolates. Therefore, we conducted the first WGS study in the country. Sequencing data in our study showed that in the last 15 years, MDR/XDR strains of M. tuberculosis belonging to the Beijing 2.2.1 lineage were the most common in the Czech Republic (70.77% of all isolates), followed by the Euro-American lineage 4 (29.23%). The results correlate with studies that have confirmed a growing incidence and transmission of MDR strains of this lineage in Eastern Europe, predominantly due to selective advantages including hypervirulence, hypermutability and increased transmissibility [22][23][24][25] . In comparison, the molecular-epidemiological study based on IS6110-RFLP conducted in Czech Republic in 2006 showed that only 7.1% of all isolates (11/155) belonged to the Beijing lineage, but represented up to 36% of all resistant isolates 9 . We assume that the prevalence of the Beijing lineage is evolutionary associated with migration events from Ukraine, which began in early 1990s. It reflects that the Ukrainian national minority is currently one of the largest membership bases in the Czech Republic 26 . In general, the prevalence of Beijing strains among MDR isolates is of growing concern, especially due to evolutionary success in terms of dispersion and adaptation to different human populations; therefore, appropriate measures should be taken for public health surveillance 27 .
In relation to resistance, XDR-TB was significantly more often detected in MDR-TB strains of the Beijing genotype (9/10; 90%) than in MDR-TB strains of non-Beijing genotypes (1/10; 10%). These data are supported by a study by Beer et al., which confirmed a significantly increased proportion of Beijing genotype (84%) among  29,30 . Moreover, pDST was performed on all samples; therefore, the identified gene variants can potentially contribute to increasing the sensitivity of currently used line-probe assays and developing new genotypic arrays. In comparison with the available catalogues summarizing the mutations associated with resistance, we identified two novel potential resistance-confirming indels in the pncA gene (287_287del; 524_524del) 29,31 . www.nature.com/scientificreports/ These mutations require further detailed investigation to determine their clinical relevance and impact on minimal inhibitory concentration to pyrazinamide. Additionally, we reported the first two strains of M. tuberculosis exhibiting genotypic resistance to new antituberculosis drug delamanid. The resistance was encoded by mutations in ddn gene. Interestingly, delamanid-resistant strains were isolated from patients without a prior treatment regimen containing this drug, probably due to a high rate of spontaneous mutations in the target gene 32 . These data are also supported by other studies that have confirmed resistance in patients without previous exposure to this drug 21,33 . Our results highlight the need to determine susceptibility to delamanid in clinical laboratories in order to prevent the spread and selection of delamanid-resistant strains. Phylogenetic analysis of M. tuberculosis genomes from subsequent samples from the same patient confirmed reinfection with a different strain of M. tuberculosis (difference more than 12 SNPs) in patient 800-17/541-18 (man, 31/32 years old). The same analysis from isolates from patient 628-17/149-18 (man, 37/38 years old) showed a difference of 0 SNPs between strains. However, additional genotypic resistance to linezolid developed within 1 year (Fig. 2, Supplementary Table 2). We hypothesize that mutations associated with resistance to linezolid developed due to non-adherence to treatment regimen or incorrect combination and dosing of antituberculosis drugs 34 .
The overall sensitivities and specificities for WGS, as mentioned in the results ("Drug susceptibility testing" section), demonstrated more than 80% concordance of the results obtained by WGS with pDST in the determination of resistance to isoniazid, rifampicin, pyrazinamide, streptomycin, second-line injectables, and fluoroquinolones, which is consistent with other available studies. studies  . The lower specificity of WGS in ethambutol resistance is due to the genotypic resistance of 23 phenotypically sensitive isolates. These isolates showed mutations in the embB gene, especially at codons 306, 406 and 497. Mutations at these codons were also identified in phenotypically resistant isolates. This phenomenon is not uncommon and has been observed in other studies, as these mutations may only slightly increase the minimum inhibitory concentration and thus lead to false negative pDST results [38][39][40] .
After cluster analysis, six clusters containing a total of 19 strains were identified (clustering rate 29.23%), most of which were formed by Beijing lineage isolates. In general, the clustering rate in our study was lower compared to the continuous 20 years genotyping study of MDR-TB in Spain (clustering rate 48.4%), 4-years drug-resistant TB WGS study in Thailand (48.5%) and the 3-years study involving MDR isolates from European Union countries (clustering rate 51.6%) [41][42][43] . These findings confirm the discriminatory power of WGS in terms of the definition of clonal complexes and the diversity of genotypes circulating in the country and highlight the need for a public health service to control and diagnose resistant TB. Furthermore, the results suggest that we lack some isolates within the clusters, or that resistance has developed due to inappropriate TB treatment regimens, as well as patient non-compliance. The higher clustering rate exhibited by the Beijing lineage (31.81%) compared to the Euro-American lineage (25%) is consistent with previous recent studies 44,45 . Also, more than 50% of clustered isolates are located in the Prague region, the capital of the Czech Republic with the largest proportion of www.nature.com/scientificreports/ foreign population due to employment. The overall clustering rate (29.23%) was almost half lower compared to study using Mycobacterium tuberculosis-specific multiple locus variable number tandem repeat (MIRU-VNTR) method for evaluation the genetic relatedness of MDR isolates collected in the Czech Republic during the years 2003-2011 28 . This may be due to false clustering and lower resolution of transmission using MIRU-VNTR in countries with low TB incidence 46 . The list of mutations associated with resistance to first-line antituberculosis drugs in clustered isolates showed that their transmission is likely to occur at the MDR-TB level. Among the first-line antituberculosis drugs, the greatest mutational variability in clustered isolates was demonstrated in the embB (encoding resistance to ethambutol) and pncA (encoding resistance to pyrazinamide) genes. In addition, mutations in the pncA gene were also unique in most of unclustered isolates. Similar diversity has been confirmed in other studies, supporting the theory that pncA mutations still remain the leading cause of pyrazinamide resistance 47,48 . Also, 4 isolates integrated within cluster 1 harboured an additional mutation in the rpoC gene. In contrast, a detailed look at mutations encoding resistance to second-line antituberculosis drugs (predominantly mutations in gyrA gene that confer resistance to fluoroquinolones and rrs gene that confer resistance to aminoglycosides) revealed several differences in isolates within the same cluster (cluster 1, 3, 4, 5; Supplementary  Tables 1, 2). This indicates that these clustered strains evolved differently due to non-adherence to the treatment regimen, rather than by recent transmission events. Linezolid resistance (mutation rplC C154A) was confirmed in 2 isolates within the cluster 3. These findings highlight the need to control linezolid dosing in a resistant TB treatment regimen and support the theory of linezolid resistance association with the Beijing lineage 49,50 . However, our study has some limitations. Several XDR/MDR cultures could not be reactivated in MGIT medium even after several attempts. In addition, WGS results could not be compared with other genotypic methods. Nevertheless, this study confirmed the utility of WGS in surveillance of drug resistant TB strains circulating in the Czech Republic. Sequencing data analysis identified complete resistance profiles of MDR and XDR strains of M. tuberculosis, suggesting that the implementation of this methodology in routine diagnostics could improve TB control at the national level.

Conclusion
This study highlights the relevance and utility of WGS as a high-resolution approach to perform genetic analysis of drug-resistant TB in the Czech Republic. WGS has detected gene mutations encoding resistance to first-line and second-line antituberculosis drugs (including delamanide and linezolid), demonstrating its clinical importance in TB therapy. In addition, our study provided a baseline genotypic characterization of the MDR-and XDR-TB strains circulating in the Czech Republic and, in combination with epidemiological information, revealed several transmission chains that will improve the surveillance activities in the country. Bioinformatics analysis. Paried-end Illumina read data was first prepared for analysis in the following manner: we used trimmomatic (v0.39) to remove any adapter fragments and low-quality tail-ends with quality phred-scores ≤ Q3 52 . All reads shorter than 35 bp in length were discarded. We then used Kraken2 (v2.2.1) to screen libraries for any contaminating reads that were assigned to taxa outside of the Mycobacterium tuberculosis complex using a custom database 53 . The remaining reads were quality-assessed using seqkit 3 (v0.15) 54 . To assign isolates to MTBC lineages, and identify antibiotic resistance markers (SNPs, indels and frameshifts), we used TB-profiler (v2.8.6; using the M. tuberculosis H37Rv genome as the mapping reference) 29 . We identified one isolate (CZ445-15) that consisted of a mix of two lineages, which was excluded from the rest of the analyses.

Materials and methods
For phylogenetic and cluster analysis we identified SNPs in the following manner: Reads were mapped against the hypothetical MTBC ancestor reference genome, described previously, using BWA (v0.7.17), and de-duplicated, to remove optical PCR duplicate reads, using picard tools (v2.12; https:// broad insti tute. github. www.nature.com/scientificreports/ io/ picard/) 55 . We then used samtools to identify fixed variants with a frequency of at least 70% and removed any variant identified within repetitive genomic regions (PE/PPE), insertion sequences, as well as prophages. A concatenated alignment of 3657 identified SNP positions was then generated from the resulting vcf-files which was used to generate a distance matrix of SNP distances using the dna.dist function implemented in the R package ape 56 . We used IQ-TREE (v1.6) to construct a maximum likelihood phylogeny using an GTR + F + ASC evolutionary model. The IQ-tree software searches through several evolutionary models and selects for a "best-fit model" using the Bayesian Information Criterion (BIC) score assigned to each tested model. The GTR + F + ASC model consistently scored the highest. We then used the iTol web-based phylogeny tool (v5) to visualize the tree 57,58 . A distant matrix was generated from MTBseq and a minimum spanning tree was constructed using GrapeTree software (https:// github. com/ achtm an-lab/ Grape Tree) with the maximum distance threshold of 12 SNPs (due to reporting an analysis of strains collected in 15 years timeframe) for linked transmission 59 www.nature.com/scientificreports/