Population structure and genetic diversity of Mycobacterium tuberculosis in Ecuador

Tuberculosis (TB) is a significant public health problem in Ecuador with an incidence of 43 per 100,000 inhabitants and an estimated multidrug-resistant-TB prevalence in all TB cases of 9%. Genotyping of Mycobacterium tuberculosis (MTBC) is important to understand regional transmission dynamics. This study aims to describe the main MTBC lineages and sublineages circulating in the country. A representative sample of 373 MTBC strains from 22 provinces of Ecuador, with data comprising geographic origin and drug susceptibility, were genotyped using 24 loci-MIRU-VNTR. For strains with an ambiguous sublineage designation, the lineage was confirmed by Regions of Difference analysis or by Whole Genome Sequencing. We show that lineage 4 is predominant in Ecuador (98.3% of the strains). Only 4 strains belong to lineages 2-sublineage Beijing and two strains to lineage 3-sublineage Delhi. Lineage 4 strains included sublineages LAM (45.7%), Haarlem (31.8%), S (13.1%), X (4.6%), Ghana (0.6%) and NEW (0.3%). The LAM sublineage showed the strongest association with antibiotic resistance. The X and S sublineages were found predominantly in the Coastal and the Andean regions respectively and the reason for the high prevalence of these strains in Ecuador should be addressed in future studies. Our database constitutes a tool for MIRU-VNTR pattern comparison of M. tuberculosis isolates for national and international epidemiologic studies and phylogenetic purposes.

More recently, whole-genome sequencing (WGS) has been introduced as the most accurate method for studying molecular epidemiology of MTBC. However, WGS is still a challenging and expensive approach for developing countries like Ecuador. MIRU-VNTR is considered a gold standard for MTBC genotyping, showing an acceptable discriminatory power and reproducibility for studying the genetic diversity and biogeographical structure of MTBC at a local, regional or national level 5 .
Although the MTBC genetic diversity has been well described for some countries of the Americas, there is an important lack of information for others 6 , and there is little information on the genetic diversity of M. tuberculosis in Ecuador. Two reports showed a low prevalence of the Beijing family in the country 7,8 and another study analyzed strains from the capital Quito showing a high prevalence of lineage 4 9 . Although this latter was a pioneer study in addressing the population structure of MTBC in Ecuador, it has limitations due to the limited sample size (104 isolates), drug resistance profile (all isolates were resistant to isoniazid, rifampicin or both) and location (all strains came from one single hospital in Quito). The present work aimed to study the genotypic diversity of strains of MTBC in Ecuador with a countrywide MTBC strain collection and compare our results with reports from other Latin-American countries.

Materials and Methods
ethics statement. The study was approved by the ethics committee of University San Francisco de Quito (code 2017-023IN) and Instituto Nacional de Salud Pública e Investigación Leopoldo Izquieta Pérez" (INSPI), both certified by Ministry of Public Health from Ecuador following guidelines from Declaration of Helsinki. All samples were anonymized and no data of the patients were made available.
the "Laboratorio de Referencia de Micobacterias" at the "Instituto Nacional de Salud Pública e Investigacion (INSPI) Leopoldo Izquieta Pérez" in Ecuador. Strains were provided by two public INSPI laboratories in Quito and Guayaquil, the only public laboratories in Ecuador that carry out mycobacterial culture and Guayaquil the only laboratory in Ecuador that performs drug resistance testing for the first-and second-line TB antibiotics. Consequently, MDR samples were usually much more likely to be collected by these laboratories. Strains from all provinces were included with exception of Imbabura and Carchi; no strains were available for these provinces, however, these provinces reported less than one percent of total national TB caseload. Ecuador comprises 4 distinct geographical regions: Andean, Costal, Amazonia and Galapagos Islands. Since we obtained only 1 strain from the Galapagos Islands we will refer to 3 geographical regions henceforth.
The strains included in this study were isolated in the years: 2013 (n = 41), 2014 (n = 121), 2015 (n = 179), and 2016 (n = 32) ( Table 1). The sample size for each province was calculated using standard statistical methods 8 . Briefly, a cluster-randomized sampling method with an error = 0.05 and z = 1.96 was used to obtain a representative sample and in proportion with the number of reported smear-positive cases in 2013 and reported MDR case for each province. See Supplementary Table 1 for details of the strain distribution.
Drug susceptibility testing. Drug susceptibility testing for first-line drugs was performed using the proportion method of Canetti et al. on L-J medium with a minimal inhibitory concentrations of 40 µg/mL for RIF, 0.2 µg/mL for isoniazid, 4 µg/mL for streptomycin, 0.4 mg/mL for ethambutol and 200 µg/mL for pyrazinamide. The MTBC H37Rv strain was used as a control strain for drug susceptibility testing. DNA extraction. Several colonies from isolates grown in Ogawa-Kudoh medium were resuspended in 500 µl of TE (10 mM Tris pH 8.0 1 mM EDTA pH 8.0), boiled at 95° for 45 minutes, and centrifuged at 10,000 × g for 5 minutes. Supernatants were collected and stored at −80 °C until use.
MIRU-VNTR genotyping. The 24-loci MIRU-VNTR typing was used in this study and a PCR for each of these loci was performed as has been previously described 10 . Products were analyzed by electrophoresis using 2% agarose gels, long runs of 3-4 hours and using 50 bp ladders as a band size reference. From the gel images, the corresponding MIRU-VNTR bands were interpreted, based on reference table indications, as copy numbers. To ratify the results, about 35% of the strains were analyzed two times.

Regions of difference (RD) PCR genotyping.
To confirm lineages and sublineages, especially for strains with an ambiguous MIRU-VNTR result, RD genotyping was performed for the chromosomal regions RD9, TbD1, RD4, RD5, RD10, RD12 RD 750, RD726, RD724, RD 239, RD105, RD702, RD711, IS1561, as previously described by Gagneux et al. 11 . Sanger sequencing was performed for the detection of the 7 bp deletion in the polyketide synthase locus (pKS1-15 RD) enabling the identification of Euro-American, Indo-Oceanic and Asian lineages [12][13][14] . The sequence analysis was carried out with the Geneious program (Biomatters); 253 of the 373 isolates (68%) were submitted to RD analysis including 220 strains representing all major and minor branches of the minimal spanning tree and 33 strains with a doubtful or ambiguous sublineage assignation.
Whole genome sequencing (WGS). WGS was carried out for 8 isolates with an ambiguous result for MIRU-VNTR and DR analysis Also, 3 Beijing strains were sequenced for confirmation and further studies. The Illumina Hiseq 4000 (Macrogen, South Korea) sequencer was used and Fastq files were analyzed using TGS-TB online platform available at https://gph.niid.go.jp/tgs-tb/, obtaining a coverage over 99% in all samples. A maximum-likelihood core genome phylogeny tree analysis was done with RAxML (Randomized Accelerated Maximum Likelihood) with 1000 X bootstrap test 15 . Whole-genome sequences reported in this study were deposited in GenBank under the accession numbers: VBWI00000000, VBWH00000000, VBVE00000000, VBVJ00000000, VBVK00000000, VBVI00000000, VBVD00000000, VBWK00000000, VBVG00000000, VBWJ00000000, VBVH00000000. phylogenetic analysis. To analyze the phylogenic relationship between the strains we carried out a minimum spanning tree (MST) analysis using the MIRU-VNTRplus web application available at https://www. miru-vntrplus.org/MIRU/index.faces. We used a categorical algorithm including both MIRU-VNTR and Regions of Difference data; stepwise weighted distance (D SW ) measurements were used when only MIRU-VNTR data was available 16 . The comparison was done with previously reported MIRU types; 186 strains representing all known lineages and sublineages and stored in the same database. Following the algorithm published by Allix-Béguec, the best match analysis with a cut off value of 0.16 and posterior confirmation by tree-based analysis 16 was performed using the numerical code MTBCC15-9 based on 15 discriminatory loci and 9 auxiliary loci, for sublineage assignment 17 .
Clonal complexes (CC) and clusters. Clonal Complexes were defined as strains with a maximum of two MIRU loci difference 18 . Cluster strains are strains with an identical MIRU-VNTR pattern.

Results
All 373 strains included in this study generated interpretable MIRU-VNTR patterns and these were compared with the database available in the MIRU-VNTR plus platform (www.miru-vntrplus.org). Because of inconclusive results, WGS was carried out in 8 strains showing MTBC sublineages: Haarlem (2 strains), 1 S strain, 5 strains of sublineage X. Also, we confirmed with WGS the three Beijing strains present in our strain collection. The sequenced S strain belonged to sublineage 4.4.1.1. The results of WGS can be found in Supplementary Fig. 1. Mutations associated with drug resistance can be found in Supplementary Table 4.
We found no statistical difference in the distribution of the TB sublineages LAM and Haarlem (Pearson X 2 = 10.242, df = 8, p = 0.248) for Ecuadorian geographical regions. The S lineage, with 49 isolates, was predominantly found in the Andean region, especially in Quito, and in the Amazonia, both regions with twice the prevalence in comparison with the Coastal region. The X sublineage, with 16 isolates, was predominantly found in the Coastal region (13 isolates), especially in Guayaquil, and only 1 and 2 isolates in the Amazonia and the Andean regions respectively ( Table 1). The Ghana and Delhi sublineages, both with two strains, were only found in the coastal region. In Fig. 1 we have visualized the distribution of MTBC genotypes for Ecuador and in Table 1 the details of the numbers and prevalence rates for the three main geographic regions of Ecuador are shown.
The sublineage distribution for the 2 largest cities in Ecuador, (Quito and Guayaquil, with about 2 million and 2.7 million inhabitants respectively) is shown in Table 2 and Fig. 1. There was a statistically significant difference (Pearson X 2 = 13.237, df = 4, p = 0.009) in LAM prevalence in Quito and Guayaquil with 31.2% (n = 19) and 53.0% (n = 71) respectively. The prevalence of the S strains in Quito was twice that of Guayaquil, conversely, the prevalence of X strains in Guayaquil was double than that in Quito.
Drug resistance profiles and MIRU-VNTR typing. Drug resistance profiles were available for 320 of the 373 MTBC isolates in this study ( Table 3). Most of the strains were susceptible for all 4 first-line drugs (n = 171, 53.6%), 78 (24.5%) were MDR-TB, and 3 (0.9%) were polyresistant. Excluding Beijing, Delhi, and New because of the low number of isolates, MDR prevalence among lineages was only statistically different (Pearson X 2 = 16.124, df 6, test, p = 0.013) for S strains with 10.3%, compared with LAM with 30.8%, X with 26.66% and Haarlem with 22.5%. For rifampicin resistance there was no statistically significant differences among the sublineages Haarlem (n = 5, 4.9%), LAM (n = 5, 3.4%), X (n = 1, 6.6%) and S (n = 1, 2.6%). A statistically significant differences  Table 3. Drug resistance profiles distribution among MTBC sublineages. 320 isolates with complete profiles were included; *indicates statistically significant differences (p < 0.05). For this analysis, we exclude the nine strains of the L4 lineages without a sublineage assignment. The map of the country shows the pie charts of the distribution of the major sublineages for the three main geographic regions, the coastal region including Guayaquil, the Andes region including Quito and the Amazon region. Also, in separated pie charts, the distribution of sublineages for the entire country and the two largest cities in Ecuador; Quito and Guayaquil (map was elaborated with sotfware Qgis 3.12; http://qgis.osgeo. org).

Clonal complexes and clusters. Nineteen Clonal Complexes (CC), defined as strains with a maximum
of two MIRU loci difference, were found encompassing 93 (24.9%) of the isolates. See Fig. 2 Table 2. Seven CC could be found only in Guayas province, comprising 52 of the 161 (32%) strains obtained from this Province. Other CCs were also found in Los Rios and El Oro provinces comprising 10 of the 44 (23%) isolates and 9 of the 21 isolates (43%) in these Provinces. This in contrast with the most densely populated province Pichincha were only 4 out of 61 strains (7%) formed part of a clonal complex. Moreover, 56 of the 171 LAM strains (33%) and 35 of the 118 Haarlem strains (30%) belonged to a CC, which may suggest no major differences in strain transmissibility. The biggest CC (comprising 22 LAM strains from Guayaquil) and another CC (comprising two S strains from Los Rios province) have been previously reported in Colombia 19,20 (supplementary  Table 2). Only six clusters (strains with an identical MIRU-VNTR pattern) compromising 16 strains were found, representing 4,3% of the strains (cluster index 2.6%) and an indication of potentially direct transmission. Five of these clusters were located in the Province of Guayas and one in Pichincha Province, Discriminatory power. The

Discussion
This investigation was undertaken to provide a nationwide analysis of the MTBC population structure in Ecuador. Only some countries in Latin America like Paraguay 21 [43][44][45][46] ) and/or isolates with drug resistance (Argentina Peru and an earlier study from Ecuador [47][48][49] or isolates from patients at high risk for drug resistance (Bolivia 34 ). Also, most of the previous studies were done with spoligotyping or a combination of spoligotyping and MIRU-VNTR. We used a combination of MIRU-VNTR and DR typing. For sublineages difficult to resolve (for example the X sublineage), we used WGS for confirmation.
The Ecuadorian strains we analyzed in this study belonged mostly to lineage 4 which is dominant in all South American countries 6 and it has been associated with the European colonization of Americas 50 . A low prevalence of two other lineages, namely 2 and 3 was found: lineage 2-Beijing has been detected in other countries in South America 6,30,33,34,51 and was mostly linked to Chinese immigrants. Beijing has a high prevalence in Peru and Colombia but a low prevalence in Ecuador, which is in agreement with 4 previous reports [7][8][9]52 .
Sublineages LAM and Haarlem (Lineage 4) are predominant in Ecuador which is similar to the rest of South America 53 . Distribution of LAM was homogenous in Ecuador except in the Andean region where it has been replaced by sublineage S; in Quito with a prevalence of 24.6% and the rest of the Ecuadorian Andean region with 13%. In South America, the S strains have been detected only in Paraguay (9.5% prevalence), Venezuela, and Brazil (prevalence 2-4%) 6 . Data from Paraguay showed that the S strain is responsible for outbreaks 21 . The S strain (or L4.4 lineage), has a moderate global distribution, previously observed in parts of Asia and Africa and it is the most common strain in New Zealand, which accounts for 43% of New Zealand born L4 cases 54 . Whole-genome sequencing of one Ecuadorian S strain placed this strain into the L4.4.1.1 sublineage clade which is prevalent in indigenous populations of Polynesia and Canada 54 . It will be interesting to determine if the other Ecuadorian S strains also belong to this sublineage clade and infects preferentially a specific population in Ecuador and this needs further research.
In our hands, the X sublineage did not always resolve well by MIRU-VNTR and DR typing and we confirmed this sublineage by WGS in 6 strains. Lineage 4 sublineage X (L4.1.1) has a 4% prevalence in Ecuador which is higher than that of other South American countries except for Rio Grande in Brazil and Chile 22 . The L4.1.1 (X) sublineage mainly occurs in the Americas and lower proportions in few countries of southern Africa, Asia, and Europe 53 . In Chile, the proportion of X lineage has been reported noticeably higher in certain provinces, in Iquique and Concepción (11.7% in both), as compared to Santiago the capital (1.6%) 24 . This difference in Chile can be explained because of immigration patterns, which have greatly increased in recent years from countries like Peru, Colombia or Ecuador, or a genetic predisposition that makes the population of these cities more susceptible to certain genotypes 55 . In Ecuador, the Coastal prevalence is twice as high as the Andean one. We also found one strain of the NEW sublineage (Sublineage L4.5); as far as we know this is the first description in South America 6 . This strain is frequent in China, Vietnam, and the Filipinas and present in Angola and the United States 53 . The migration of Ecuadorians to and from the United States is a possible explication for the presence of this strain in Ecuador. Only 2 isolates belonged to sublineage Delhi (lineage 3) which is found mainly in East Africa, the Middle East, and India.

Comparison of genotypes with another study from ecuador. A previous study that included 104
isolates from 1 hospital in Quito described six genotypes; LAM (33.7%) and Ghana (30.8%), S sublineage (5.8%), Cameroon (3.8%), Haarlem (3.8%), and Beijing (1.9%) 9 . Our study shows some discrepancies: they found 3.8% of Haarlem while we showed 37.7%; they found 30.8% of Ghana (mostly MDR), while we found 3.8% prevalence (non MDR); they found 3.8% prevalence of Cameroon sub-lineage which was absent in our study. Important Scientific RepoRtS | (2020) 10:6237 | https://doi.org/10.1038/s41598-020-62824-z www.nature.com/scientificreports www.nature.com/scientificreports/ differences among the studied populations can explain these differences. While we include 375 isolates from all over the country with a prevalence of drug-resistant strains of approximately 25%, the other study consisted of 104 strains, all drug-resistant isolates from a single hospital in Quito, Pichincha. This is a private hospital serving a specific population, introducing another, social and economic, bias. Our strains came from the INSPI institute serving the most needed of our population.
Clonal complexes and their distribution in ecuador, colombia and peru. The minimum spanning tree analysis distinguished 19 clonal complexes made up of 2 to 22 strains ( Fig. 2 and supplementary Table 2). Regarding the distribution of the clonal complexes (CC), only in 3 out of 18 of the CC were absent in Quito (Pichincha province) or Guayaquil (Guayas province), probably because these cities are important migration centers 56   conclusions This is the most comprehensive overview of the genetic diversity of M. tuberculosis in Ecuador to date and it is the first report of Delhi and New strains in South America. The MTBC population in Ecuador seems to have a limited diversity of sublineages among which LAM, Haarlem, and S are the most prevalent. The S genotype is an important lineage in the Andean and Amazonia regions, which may reflect an adaptation to the local population. Ongoing Venezuelan immigration, more than 4.5 million Venezuelans have left their country in the last 4 years and around 400.000 are living actually in Ecuador 62 , is likely to have implications for the pattern of tuberculosis genotypes in the future. The current baseline study provides information for future studies and our database constitutes a tool for MIRU-VNTR pattern comparison of M. tuberculosis isolates for national and international epidemiologic studies and phylogenetic purposes.
Limitation of this study. No complementary spoligotyping data of our strains are available to confirm lineages and sublineages in this study though we confirmed the phylogenic status of an important part of our strains using RD and WGS. Our study has a relatively high prevalence of MDR-MTBC isolates (25.7%) which has introduced a bias factor. MDR TB prevalence has been reported to be 9% in Ecuador 63 . Since MDR is positively associated with LAM genotypes this would let to an overestimation of the presence of LAM in our study. No analysis associating age, sex, HIV status, diabetes and other predisposition factors for tuberculosis with certain genotypes could be established as data of patients were not available respecting patient privacy. S strains were easily identified with MIRU-VNTR typing but because this sublineage is not very common; a single isolate of the S strains (n = 49) was sequenced, indicating that this strain belonged to L4.4.1.1. The other S strains of this study or a part should also be sequenced to confirm this lineage assignment. It is well known that VNTR types are homoplastic and a recent study clearly showed that the many VNTR "clades" were not monophyletic, belonging to several clades when examined by WGS. This demonstrates the unreliable nature of defining a sub-clade based on MIRU-VNTR alone 64 .

Data availability
Data from this study will be made fully available and without restriction upon request.