Population structure and spatial distribution of Mycobacterium tuberculosis in Ethiopia

Ethiopia is one of the countries with a high tuberculosis (TB) burden, yet little is known about the spatial distribution of Mycobacterium tuberculosis (Mtb) lineages. This study identifies the spoligotyping of 1735 archived Mtb isolates from the National Drug Resistance Survey, collected between November 2011 and June 2013, to investigate Mtb population structure and spatial distribution. Spoligotype International Types (SITs) and lineages were retrieved from online databases. The distribution of lineages was evaluated using Fisher’s exact test and logistic regression models. The Global Moran’s Index and Getis-Ord Gi statistic were utilized to identify hotspot areas. Our results showed that spoligotypes could be interpreted and led to 4 lineages and 283 spoligotype patterns in 91% of the isolates, including 4% of those with multidrug/rifampicin resistance (MDR/RR) TB. The identified Mtb lineages were lineage 1 (1.8%), lineage 3 (25.9%), lineage 4 (70.6%) and lineage 7 (1.6%). The proportion of lineages 3 and 4 varied by regions, with lineage 3 being significantly greater than lineage 4 in reports from Gambella (AOR = 4.37, P < 0.001) and Tigray (AOR = 3.44, P = 0.001) and lineage 4 being significantly higher in Southern Nations Nationalities and Peoples Region (AOR = 1.97, P = 0.026) than lineage 3. Hotspots for lineage 1 were located in eastern Ethiopia, while a lineage 7 hotspot was identified in northern and western Ethiopia. The five prevalent spoligotypes, which were SIT149, SIT53, SIT25, SIT37 and SIT26 account for 42.8% of all isolates under investigation, while SIT149, SIT53 and SIT21 account for 52–57.8% of drug-resistant TB cases. TB and drug resistant TB are mainly caused by lineages 3 and 4, and significant proportions of the prevalent spoligotypes also influence drug-resistant TB and the total TB burden. Regional variations in lineages may result from both local and cross-border spread.


Study setting and participant enrollment
Ethiopia is a country in East Africa.Six nations border the nation: Sudan, Eritrea, Djibouti, Somalia, Kenya, and South Sudan 12 .Administratively, the country is divided into four levels: regions, zones, woredas (districts) and kebele (wards).The present study utilized Mtb isolates obtained from DRS which were collected from all regions.The DRS was carried out on 32 health facilities, between November 2011 and June 2013.Smear-positive TB patients were the DRS's target populations.A total of 1785 smear-positive TB patients were included.Ninetyseven percent of the isolates (n = 1735) were available for spoligotyping.

Spoligotyping
All available isolates were undergoing spoligotyping.The spoligotyping was done following the instructions of the manufacturer using a commercially available kit (Qiagen and Sigma) 13,14 .Each run of the spoligotyping has incorporated a positive control (H37Rv and M. bovis) and a negative control (water).There was double data entering and cross-checking of any inconsistent readings.An international shared type (SIT) was assigned using the SITVIT2/MIRU-VNTRplus or SpolLineages databases.Lineages were extracted from SpolLineages 15 .Spoligotypes having the same spoligotype pattern were defined as "Cluster" spoligotypes 16 .

Spatial analysis
The geocode of the health facilities was obtained from the Centers for Disease Control and Prevention of Ethiopia.We utilize ArGIS (version 10.8) for the spatial analysis by taking each health facility as a single unit.The Global Moran's Index was applied to assess distribution of lineages.Getis-Ord Gi statistic was utilized to identify hotspots.A fixed distance band and a default threshold distance band were used for spatial analysis, which was based on Euclidean distance.

Statistical analysis
The data was captured and analyzed using SPSS v20 (IBM SPSS Statistics 20).The distribution of lineages and predominate spoligotypes (n > 20) 9 by drug resistance profiles were evaluated using Fisher's exact test.Bivariate and multivariate logistic regression models were employed to assess the association of prevalent lineages (lineage 3 and 4, 96.5%) with demographic, clinical, drug resistance and location profiles.The variables with p-value ≤ 0.2 were subjected to multivariable logistic regression models.For the purposes of the logistic regression analysis, the region with the most similar proportion of lineages 3 and 4 to the national average was selected as reference.A p-value of < 0.05 in Fisher's exact test and multivariate logistic regression was considered statistically significant.

Ethical approval
This study obtained ethical approval from the Ethiopian Public Health Institute (SERO-59-5-2016) and the Addis Ababa University Ethics Committee Institutional Review Board (SF/MCMB/702/08/2016).We used stored isolates.Personal identifier had not been collected, the DRS enrolled patient using unique survey identification number.

Study isolates
A total of 91% (1579/1735) of the isolates, including 68 MDR/RR and 58 INH-resistant Mtb isolates, showed interpretable spoligotype patterns, and each one came from a unique person.Spoligotype and drug susceptibility results were available for 1402 isolates.SITs were retrieved for 88% (1393/1579) of the spoligotypes that were classified into 283 spoligotypes.Ninety percent of the spoligotypes (1423/1579) were grouped into 127 spoligotype patterns with a cluster size of 2-247 (Supplementary Table S1).

Spatial distribution of Mtb lineages
Variable proportions of the lineages were reported across regions (Table 2).Dire Dawa was used as a reference since it had the closest proportion of lineage 3 (28.6%)and 4 (71.4%) to the national averages.The bivariate analysis showed that lineage 3 was less likely to be found in Oromia but more likely to be found in places like Gambella, Southern Nations Nationalities and Peoples Region (SNNPR), and Tigray than lineage 4.Only Gambella, SNNPR, and Tigray, however, displayed significant differences on multivariable analysis (Table 3).Accordingly lineage 3 was significantly higher in TB patients from Gambella (AOR = 4.37, P < 0.001) and Tigray (AOR = 3.44, P = 0.001) compared to lineage 4, and lineage 4 was significantly higher in patients from SNNPR (AOR = 1.97,P = 0.026) compared to lineage 3.

Hotspot analysis of Mtb lineages
The Global Moran's I test revealed that lineage 7 distribution variability was statistically significant (Table 4).Even though, the hot spot was not within a 95% CI, lineage 4 was found to have hotspots in southern Ethiopia.The hotspot analysis is displayed in Fig. 1.Lineage 1 hotspots were identified in eastern Ethiopia while, lineage 7 hotspots were identified in the north and west parts of Ethiopia.

Demographic factor associated with dominant lineages
The bivariate analysis of the prominent lineages (lineage 3 and 4) showed that, compared to lineage 4, lineage 3 was more likely to be associated with male, HIV positive, and retreatment TB cases (Table 3).Multivariable  www.nature.com/scientificreports/analysis, however, revealed that only males exhibited a significant association.In comparison to female patients, male TB patients had increased probabilities of having lineage 3 compared to lineage 4 (AOR = 1.37,P = 0.016).

Discussion
This study reports on the population structure and spatial distribution of Mtb using isolates collected for the Drug Resistance Survey.Our study showed that 4 out of 9 lineages were circulating in Ethiopia, with lineages 3 and 4 as major lineages that include MDR/RR and INH resistant TB.Furthermore, a noteworthy proportion of MDR/RR (57.8%) and INH resistant (52%) TB is possessed by the three predominant spoligotypes (SIT149, SIT53, SIT21), whilst the five predominant spoligotypes (SIT149, SIT53, SIT25, SIT37, and SIT26) account for a substantial portion of overall TB cases (42.8%).These findings indicate that some spoligotypes have a high percentage which suggests the possible clonal expansion of those spoligotypes in the country.Furthermore, the spatial analysis reveals that the distribution of the lineages vary by region, which is essential knowledge for improving collaborative planning of the TB program activities between the regions of Ethiopia and among neighboring countries.www.nature.com/scientificreports/Examining Mtb strain diversity and distribution allows for a better understanding of TB transmission dynamics and identification of highly transmissible genotypes [6][7][8][9] .Certain lineages and spoligotypes have been more prevalent in particular locations; for instance, T2/Uganda II 17 , T3ETH 18 , and EAI2-Manila 19 have been reported in Uganda, Ethiopia, and the Philippines, respectively.Furthermore, 42-55% of the spoligotypes that have been found can be attributed to 4-5 predominate spoligotypes 18,[20][21][22] .In line with prior studies, our results indicated that 42.8% of the isolates investigated were linked to the five dominate SITs.Additionally, we found that the three predominate spoligotypes have a major share in MDR/RR (57.8%) and INH resistant (52%) TB isolates.Of MDR cases, 60.8% (n = 146/240) in India and 58.2% (n = 100/134) in Zambia were from the three predominate spoligotypes 21,22 .This might be the outcome of the competitive fitness of the strains and the host-pathogen coevaluation effect, which increase the likelihood that local strains will spread in patient groups within the same nations 5,23 .
The population structures of the Mtb lineages are unique to each country, and the distribution of lineages within a country is also distinct 4,7,17,25 .In the present study, the proportion of lineages 3 and 4 varied among regions, with lineage 3 being significantly greater than lineage 4 in reports from Gambella and Tigray and lineage 4 being significantly higher in SNNPR than lineage 3. Our results are consistent with reports of spatially varied TB lineages within a country.The two prominent lineages in South Africa, lineages 1 and 4, show spatial heterogeneity across province 25 .In comparison to other zones, the Ugandan II family is primarily found in the south-west zone 17 .Furthermore, our data also reflects the dominant lineages in the neighboring countries, such as Sudan (lineage 3), which borders Gambella and Kenya (lineage 4), which borders the SNNPR 26,27 .
Lineage 7 is one of the restricted lineages which is almost exclusively reported in Ethiopia 18,28 .We found that a lineage 7 hot spot has been identified in north and west part of Ethiopia.Prior studies report that lineage 7 is more common in north Ethiopia (13-15.6%)than in other parts of the country (< 0.6%) 19,[28][29][30] , which explains our findings.However, we found 0-1.9% of lineage 7 in the west of Ethiopia, despite the fact that, to the best of our knowledge, lineage 7 has not been documented in this region, which suggests a need of further research.In addition to lineage 7's only reported presence being restricted in Ethiopia, a recent study revealed that lineage 7's reduced protein abundance may contribute to its slower growth and less virulent phenotype 10 .This might contribute to the transmission of lineage 7 in certain locations only, even though the lineage is known for its host-pathogen co-evaluation in Ethiopian TB patients.Our findings indicated that a hotspot for lineage 1 was found in eastern Ethiopia.Previous studies from the eastern part of Ethiopia show that lineage 1 made up 4.7-8.4% of the population, whereas a multicenter analysis found that lineage 1 made up 1.1% of the population 18,29,31 .Furthermore, studies employing sizable data sets revealed that lineage 1/EAI were significantly www.nature.com/scientificreports/more common in Somalia (33.63%), which borders the eastern part of Ethiopia 32 .It is possible that local and cross-border transmission in the eastern region of Ethiopia accounts for the increased occurrence of lineage 1. Gender disparity has been reported in overall TB burden, where the majority (55%) of the global burden, as well as more than 50% of the TB in Ethiopia, occurs in males 2 .The experimental study showed that males acquire Mtb infection far earlier than females due to differences in B cell follicle growth between the sexes 33 .Our results showed that, in contrast to lineage 4, lineage 3 was more common in male patients with an AOR of 1.3 than in female TB patients, which may be partially attributed to male susceptibility.Even though lineage 4 was the predominate lineage in our findings, the gender disparity has not been indicated.There is a possibility that the variation is due to additional underlying patient risk factors, which calls for more carefully monitored research.
This study was subject to limitations.We used spoligotyping to describe the percentage of predominate spoligotypes that may lead to overestimation of the strains in the same categories; hence spoligotyping lacks discriminatory power.Although we include health facilities from every region, the sample we have collected from each region is not representative of the entire region.

Conclusion
This study reported the population structure of Mtb using samples from all regions of Ethiopia.Our study showed that the Mtb population comprises lineages 1, 3, 4, and 7, with lineages 3 and 4 accounting for the majority of cases of INH-resistant TB, MDR/RR TB, and overall TB.The five predominant spoligotypes account for a significant portion of all TB cases (42.8%), while the three predominant spoligotypes are responsible for a notable percentage of MDR/RR (57.8%) and INH resistant (52%) TB.These might be the result of the possible relative transmissibility advantage of those spoligotypes, which influences drug-resistant TB as well as the total TB burden.The lineage variation by region and the similarity with neighboring countries suggest both local and cross border spread of TB, which is likely to be the result of the bacterial genetic background of the lineages and/or human trafficking.

Figure 1 .
Figure 1.Geographical distribution of hot and cold spot of Mycobacterium tuberculosis lineages with confidence interval.

Table 1 .
Distribution of the lineage and predominate spoligotypes in rifampicin and/or isoniazid resistant tuberculosis.a The total number represent those isolate with lineage information.b The total number represent those isolate with spoligotype results.

Table 2 .
Proportion of the four lineages by regions.No. number, L lineage, SNNPR Southern Nations Nationalities and Peoples Region.

Table 3 .
Logistic regression analysis of lineage 3 compared with lineage 4 by demographic, clinical variable, drug resistance and regions.NA not applicable, G: Gumuz, L3 lineage 3, L4 lineage 4, HIV human immunodeficiency virus, COR curd odds ratio, AOR adjust odds ratio.*The AOR being in L4 compared to L3 was 1.97.

Table 4 .
Global spatial autocorrelation results of Mtb lineages.