Analysis of Mycobacterium africanum in the last 17 years in Aragon identifies a specific location of IS6110 in Lineage 6

The purpose of this study was to increase our knowledge about Mycobacterium africanum and report the incidence and characteristics of tuberculosis (TB) due to their lineages in Aragon, Spain, over the period 2003–2019. The study includes all the cases in our region, where all the M. tuberculosis complex isolates are systematically characterised. We detected 31 cases of M. africanum among 2598 cases of TB in the period studied. TB caused by M. africanum is rare (1.19%) in our population, and it affects mainly men of economically productive age coming from West African countries. Among the isolates, Lineage (L) 6 was more frequent than L5. The genotyping of these strains identified five clusters and 13 strains with a unique pattern. The isolates’ characterisation identified a copy of IS6110 within the moaX gene, which turned out to be specific for L6. It will allow the differentiation of this lineage from the rest of MTBC with a simple PCR reaction. It remains to be established whether this polymorphism may limit M. africanum transmission. Furthermore, a mutation in the mutT2 promoter was found as specific for L6 strains, which could be related to the high variability found for L6 compared to L5.

Social and clinical characterisation of the TB cases. The characteristics of the cases due to M. africanum are detailed in Table 1. Of the 31 patients studied, 77.41% were male, the age range was between 18 and 62 years, with the largest in the 25-34-year age-group (45.16%). No cases were detected in the youngest and the eldest age groups. Regarding their origin, 27 patients were born in West African countries (87.09%), three were Spaniards (9.67%), and one patient was born in Bulgaria. The African countries of origin are detailed in Fig. 1.
According to the geographical location, the number of patients who lived in an urban area was superior to those who lived in a rural area (20, 64.51% vs 11, 35.48%). At least 12 of the cases lived in our country for less than  Genotypic characterisation of M. africanum isolates. The molecular analysis based on IS6110-RFLP and Spoligotyping of the M. africanum isolates showed five different clusters, including from two to six cases, and 13 isolates with a unique pattern (Fig. 2). Spoligotyping showed 13 different patterns, three were detected more than once (SIT 181 in 13 isolates, SIT 326 in six isolates, and SIT 1465 in two isolates) and distributed in the AFRI_1, AFRI_2, AFRI_3 or AFRI families according to the SITVIT definition 23 . To confirm the M. africanum lineages, the specific differential regions TbD1, RD9, RD702, and RD711 were analysed. TbD1 was present, and RD9 absent in all the isolates. Based on the study of RD702 and RD711, we could classify them into the two existent lineages of M. africanum. Twenty-four isolates belonged to L6 and seven to L5. There was a total concordance in the classifications obtained by spoligotyping and the RD analysis. The isolates classified as AFRI1 had the RD702 region deleted, and therefore corresponded to L6 isolates 2 . AFRI, AFRI_2, and AFRI_3 strains had RD702 present and, therefore, were considered L5. Two AFRI and two AFRI_2 spoligotype isolates had deleted RD711 and were sub-classified as L5.1. Meanwhile, the three AFRI_3 isolates had RD711 present, classifying them as L5.2 15 . The IS6110-RFLP showed a low number of IS6110 copies (≤ 6) in the L6 isolates, except one which showed eight copies. However, the six L5 isolates with available RFLP-pattern presented more than 10 IS6110 copies (Fig. 2). Following the strain characterisation and in the context of a new assay performed in our laboratory to rapidly identify resistances and lineages, 32 isolates of our DNA collection, including different MTBC lineages, were analysed using AmpliSeq-based methodology. Two of the 32 isolates belonged to L6 (isolates 15 and 27) and three  Specific IS6110 location in L6 strains. We studied the location of IS6110 in three L6 strains (isolates 2, 15, and 21) using ligation-mediated PCR (LM-PCR) within a study of MTBC strains with a low copy number of this IS. In addition to the copy located in the DR area, three locations were detected in Rv0963c, lipX:mshB and moaX genes. In the three strains, one of the IS6110 copies was located in the moaX gene and at identical point for all three cases (Fig. 3). Based on the results obtained, the primers moaXr (ccagtcgacgcggttgggg) and moaXd (atcgggtcattaccggcggc) were designed to verify the point of insertion of IS6110. The expected PCR products were 2128 bp if IS6110 was present and 788 bp if IS6110 was absent from the site of amplification. We sequenced the amplified fragment noting that IS6110 was inserted at nucleotide 3709622, referred to H37Rv reference genome, flanked by three bp direct repeats (gac), as a consequence of the transposition, and located 90 nucleotides from the beginning of the moaX gene (Rv3323c) and in its same direction (Fig. 3). Further analysis showed that this IS6110 copy was present in all our collection strains of M. africanum belonging to the L6 but never in our L5 strains. In addition, we observed that this location was absent in 42 isolates of low copy number studied by our group. We have also analysed this insertion point in the strains belonging to M. africanum L6, whose genomes are available in the NCBI (CP010334.1 and FR878060.1), verifying the presence of IS6110 in the moaX gene (Fig. 4). On the other hand, to investigate the intergenic IS6110 insertion in lipX:mshB as a possible specific location for M. africanum, we amplified the region with the primers LipX-F (gccgtttccccaatcgaatc) and LipX-R (gctcaggctctcatcgtcg). The expected fragment was 264 bp if the IS was absent and 1591 bp if it was present. The PCR results revealed the insertion in five out of nine L6 isolates tested and never in L5 isolates, which means it was not specific but frequent in L6 strains. IS6110 was inserted at nucleotide 1300194, flanked by two bp direct repeats (tt), in all the isolates at the same point, including those in the NCBI database. However, the location of IS6110 in Rv0963c was not detected in any other of the M. africanum isolates analysed.
Nitrate reduction activity absent in M. africanum strains. Due to the location of IS6110 interrupting the moaX gene, which codes for the enzymes involved in the synthesis of molybdenum cofactor (MoCo), necessary for the activity of the nitrate reductase (NR) enzyme 25 , we wanted to investigate whether this fact would be reflected in a difference in NR activity between the L5 and L6 strains. We analysed the reduction of nitrates of the M. africanum isolate 5 (L5) and isolate 11 (L6), and H37Rv and BCG as positive and negative controls, respectively. However, both L5 and L6 strains showed a negative result of NR activity. Both positive and negative controls were in line with expectations. Then, we analysed the sequences of narG and narI genes, which did not present any mutation to explain the L5 strain result. Nevertheless, the analysis of the narGHIJ promoter www.nature.com/scientificreports/ in four L5 isolates and two L6 isolates showed an identical mutation in − 215 (t/c), which was also present in the NCBI complete genomes of M. africanum (L6) and M. bovis. Additionally, the study of the sequenced promoter region of the narGHIJ operon showed a mutation in the − 13 (a/g) mutT2 gene, upstream narG gene, in the two L6 isolates, which was absent in the four L5 isolates tested. This was also observed in the M. africanum L6 genomes included in the NCBI database.

Discussion
This work was carried out to understand the epidemiological situation in Aragon, Spain, related to TB cases caused by M. africanum. This study used the data set on TB cases linked to the genotypes of the clinical isolates. The findings from this study indicate that M. africanum is a rare cause of TB in our region and represents 1.19% of the cases with available genotype data reported during the 2003-2019 period. A previous study regarding this causal agent 16 reviewed the percentage detected in other countries, as Brazil, Australia and Portugal, where it represented less than 1% of the TB isolates. In their work, the authors collected information on 36 TB cases of M. africanum over a 10-year period in Spain. Nineteen of these isolates were from our region and therefore included in this study. The fact that they did not systematically identify the M. africanum isolates leads us to believe that there was an underestimation of the TB cases caused by L5 and L6 in our country. Nevertheless, our study was exhaustively conducted since 2003, identifying all M. africanum cases. We consider that the incidence of M. africanum in our country should be low given the results observed in this study, even though in higher African migration areas it could be slightly different. We hypothesised that the African lineages that are rare in our population are not adapted to transmit. The descriptive analysis of the TB cases caused by M. africanum showed that most were male (77.41%) and in the 25-34 age group (45.16%). All of them were in the labour force, which could be related to being the most abundant age group among immigrants. It was more likely to occur in foreign-born people coming from West African countries (87%), being that only three cases (9.67%) were of Spanish-born people. In Esteban's study 16 performed in Spain, few Spaniards (7%), in contrast to immigrants, presented TB caused by M. africanum. The slight difference obtained by us may be because we did an exhaustive genotyping of all the cases. In our study, the patients came from different rural areas in a higher percentage (35%) than the common TB caused by M. tuberculosis, which occurred around 80% in urban areas 26 . These associations suggest that the epidemiology of M. africanum in our region is driven primarily by the migration of people from West Africa. The TB in Spaniards suggests that transmission of M. africanum might occur in Spain, but the possibility of TB acquisition during a trip (e.g., to West Africa) cannot be excluded, as one of the Spaniards presented a unique genotype strain. It would be of interest to continue the study in the coming years to check if the M. africanum strains of this work are maintained or are displaced by other MTBC strains. In previous reports, a lower transmission of M. africanum in comparison to L4 was observed. Nevertheless, the proportion of L5 and L6 is maintained over time, suggesting that other factors may be responsible for its continued presence in Africa 27 .
The presentation of the disease was in half of the cases restricted to pulmonary location. The extrapulmonary type of the disease (48.38%) was identified in a higher percentage than for the TB case notifications in Spain in 2017 (27.5%) 28 . Some studies showed a high proportion of extrapulmonary TB caused by L5 strains, suggesting that these strains might show a different ability to cause pulmonary disease than M. tuberculosis sensu stricto strains 15 . Curiously, while the extra respiratory location of the TB was high among the African cases, the three Spaniards presented only respiratory disease. The differential HLA distribution among the Mali population has been studied, and it was concluded that it might be at least partially responsible for the geographical restriction of M. africanum infections to West Africa 29 . The possibility that HLA could also affect the clinical presentation of the  For all cases, TB bacillus were susceptible to the treatment, although the Ampliseq method applied in seven M. africanum isolates detected some mutations in genes related to resistance. These polymorphisms could be specific evolutionary characteristics of the respective lineages. These results indicated that we must be cautious when reporting resistant genotypes, such as the mutations found in this study, which do not confer a resistance phenotype. Nevertheless, they could be assessed as specific for L6 (rpoB, inhA, and katG genes) and L5 (gyrA gene).
RFLP showed a substantial difference in the number of IS6110 copies between the L5 and L6 strains. L6 strains carried a lower number of copies in contrast to L5 strains. Spoligotyping and IS6110-RFLP allowed us to detect five clusters, including 17 cases. Although each technique has low discriminatory power separately, especially among low copy number strains, it increases when considered together. On the other hand, the location of some of the IS6110 insertion points adds differentiation capacity to the RFLP, as indicated in other publications where it has been described that RFLP analysis can underestimate the real copy number for the IS6110 element 33,34 . In this work, the isolates 2 and 21 present three bands that seem identical when observing their RFLP pattern. However, they share two locations (DR region and moaX) but isolate 21 has an IS6110 in the Rv0963 gene, which does not share isolate 2 (Fig. 2). This indicates that it can happen in some cases that the coincidence of a band in the RFLP pattern does not imply that the IS6110 insertion point is the same. The explanation for this would be that a small difference between the lengths of the restriction fragment generated for two different locations of IS6110 is not appreciable in the RFLP pattern. Despite this, transmission was not considered in this study as it could overestimate the recent transmission rate.
In the context of a study of the IS6110 location in low copy number strains, we discovered an insertion within the moaX gene for the L6 strains analysed, and later we verified its presence in all L6 strains but never in L5 or other MTBC families studied. A previous work studied the insertion points of IS6110 in high-copy clinical isolates, specifically focusing on the Beijing genotype and revealed that its location in moaX gene was not characteristic of Beijing family 22 . Also, we found that in a previous work where the locations of IS6110 were studied in 579 MTBC strains representatives of the major lineages circulating in Europe and Latin America, the location of IS6110 in moaX was not detected in any case 21 . In all L6 strains included in our collection and the strains whose genomes are available online, the insertion point was the same. Altogether, it strongly suggests that this location is specific for L6, allowing us to differentiate this lineage from the rest of the strains of the MTBC. Within the scheme of the evolutionary stage of the tubercle bacillus, proposed by Brosch et al. 13 , we suggest the transposition of IS6110 into the moaX gene when L6 is separated from the rest of the lineages (Fig. 4). Besides, the location in lipX:mshB was frequent in L6 strains. These results agree with previous observations, indicating that each family has preferential insertion sites 21,22,35 , which is probably related to their evolutionary relationship.
The moaX gene encodes a molybdopterin (MPT) synthase with moaD and moaE activity that contributes to molybdenum cofactor (MoCo) synthesis in MTBC 25 . It has been shown that there is functional interchangeability between the MPT synthase subunits of M. tuberculosis, and in the case of MoaX, post-translational cleavage at the Gly82 residue is required for the functionality of this enzyme 36 . According to that, the IS6110 inserted in moaX gene of L6 strains is interrupting the MoaD subunit (Fig. 3). It has been described that some mutants in genes involved in molybdopterin biosynthesis had lost their ability to resist phagosome acidification 37 . In most molybdenum-containing enzymes, the metal is coordinated to the dithiolene group of MPT to form MoCo. Enzymes that utilise MoCo harness the redox properties of molybdenum to catalyse redox reactions in carbon, nitrogen, and sulfur metabolism and to reduce terminal electron acceptors for anaerobic respiration 25 . One of these enzymes is NarG, a membrane-bound respiratory NR, suggesting a potentially important role for MoCo in the metabolism of M. tuberculosis in vivo. In an anaerobic environment, many bacteria can use nitrate as a final electron acceptor. Historically, M. tuberculosis has been differentiated from M. bovis because only M. tuberculosis can reduce significant amounts of nitrate (NO 3 − ) to nitrite (NO 2 − ). NR activity occurs at a low level during the aerobic growth of M. tuberculosis and increases significantly upon entry into the microaerobic stage. When we discovered the IS6110 insertion in moaX for L6 strains, we expected to find differences in NR activity between L5 and L6, but none showed NR activity. This indicates that the disruption of the MoaD subunit from MoaX in the L6 strains is not the only one responsible for the lack of activity observed in vitro. This result supports the hypothesis that homologous genes could compensate for any adaptive disadvantage of the bacteria due to the natural knockouts created by IS6110 insertion or other mutations 25 . Looking for another explanation for this result, we analysed the operon narGHIJ implicated in NR activity. The first mutation described 38 that prevented NR activity was − 215 (t/c) SNP in the promoter of narGHIJ operon for M. bovis. M. africanum L5 and L6 have this mutation, but also M. canetti, which has NR activity 39 . There is another region responsible for NR activity, the narK2 operon. A mutation in − 10 promoter elements of the narK2 operon reduced NR activity in BCG 40,41 . We found this mutation in the L6 strains available in NCBI but not in M. canetti, which had the same genotype as H37Rv. It seems that the presence of both mutations could explain the lack of NR activity we observed for M. www.nature.com/scientificreports/ africanum L5 and L6. However, in latent anaerobiosis, BCG overexpressed the narX gene, a fused NR 42 . Thus, a similar enzyme could play this role for M. africanum. Surprisingly, the search for mutations in the narGHIJ promoter led us to the location of a mutation in − 13 (a/g) mutT2 gene, upstream of this operon, in the L6 strains analysed and in the NCBI complete genomes of M. africanum (L6), but not in L5 strains analysed nor in other TB genomes available in NCBI. This gene was studied in the Beijing lineage as a possible cause of a major number of SNPs related to resistance 43 . It has been observed that L6 has a higher variability in its genome in comparison to L5, which could be related to a higher mutation rate 44,45 . MutT2 is involved in DNA repair, therefore the mutation detected in the mutT2 promoter could increase the polymorphisms in L6 strains 46 .
A possible limitation of this work is that the number of strains studied was low. Nevertheless, all the isolates have been exhaustively and systematically characterised in a continuous period of 17 years. Consequently, the results objectively reflect the incidence of M. africanum in our region. On the other hand, genotyping methods do not discriminate enough to analyse transmission, so that whole-genome sequencing of the isolates would be more informative.
In summary, the results of this study indicate that TB caused by M. africanum is rare in Aragon, and the majority of the cases were in immigrants from West Africa. L6 was more prevalent, with few cases of L5. As far as we know, this is the first time that IS6110 locations have been determined in M. africanum strains, which has allowed us to detect the presence of a copy of IS6110 in the moaX gene in all L6 strains. Further studies on the implication of interruption of MPT synthase subunit-encoding genes in the physiology of L6 strains and its possible relationship with lower virulence would be of interest. The analysis of this location showed that it is a specific characteristic of the L6 strains, which allows us to distinguish this lineage of M. africanum from the rest of MTBC in a simple and fast way, using a PCR-based test.

Material and methods
Origin of clinical isolates. In Aragon, a north region in Spain, all MTBC isolates are genotyped for surveillance purposes routinely since 2004, but 2003 isolates are also registered in the context of a previous study. In this work, we selected all patients with a microbiological diagnosis of TB caused by M. africanum between 1 Jan 2003 and 31 Dec 2019. The demographic (age, sex, country of birth, years since entry to Spain) and clinical (location of disease, sputum smear status, and previous diagnosis of TB) characteristics of the patients were retrospectively reviewed.
DNA was frozen at − 80 ºC and used in the different molecular techniques in this study. All strains were systematically genotyped by restriction fragment length polymorphism (RFLP) based on IS6110 and Spoligotyping. RFLP was performed as described by van Embden et al. 17 . Spoligotyping used a commercial membrane (Mapmygenome India Limited) to hybridise with the amplicons of the direct repeats region of each isolate. The procedure was previously described 48 . The genetic patterns were analysed by Bionumerics v7.6 software (Applied Maths, Kortrijk, Belgium) and introduced into the Database of the University of Zaragoza. TB cases caused by M. africanum were selected retrospectively by their spoligotype, a specific intermediate pattern between those of M. tuberculosis and M. bovis, according to the SITVIT definition 23 . Isolates were considered in cluster if they carried an identical IS6110-RFLP pattern and the same spoligotype if they had less than five copies of IS6110.

Study of single polymorphisms. A panel of primers focused on genes related to resistance in MTBC, and
SNPs for lineage differentiation was analysed by AmpliSeq-based methodology using next-generation sequencing.  point 378404, Rv0309), were amplified and analysed. Besides, the polymorphisms previously described for M. bovis in the narGHIJ operon were analysed by amplification of the different regions. Primers used were the following: mutT2F-2 (tccggatgatgatttacctcc) and mutT2R-2 (tccgccgggtcggggac) 43 ; narG-Fw (gcccagctttgacaccatcg) and NarG-Rv (gcccagatgacgtttcgccag); NarI-Fw (tggctaccactcggaatgac) and NarI-Rv (acgatgtagggccggaacag). The detailed points are referred to as NC_000962_3.

Location of IS6110 insertion sites.
To study IS6110 insertion sites, a ligation mediated PCR was used as described by Prod'hom et al. 49 to amplify one or both ends of each copy of IS6110 and its flanking sequence. Briefly, genomic DNA was digested with SalI enzyme and ligated to a linker containing a SalI restriction site. The resulting template was then digested by SalI. PCR was performed using ISA1 or ISA3, specific primers for IS6110 and directed outwards from this element 50 , and the linker primer Salgd. The template was initially denatured by incubation at 95 °C for 9 min and amplified by 35  www.nature.com/scientificreports/ followed by a final extension at 72 °C for 10 min. Amplified products were separated by standard horizontal gel electrophoresis in a 1.5% agarose gel in tris-borate-EDTA buffer (90 mM tris, 90 mM boric acid, 2 mM EDTA) and stained with ethidium bromide. PCR products were purified, using GFX PCR DNA and Gel Band Purification Kit (Amersham Pharmacia Biotech) followed by ExoSAP-IT PCR Product Cleanup Reagent (Affymetrix), sequenced and analysed for homology with Tuberculist (http:// genol ist. paste ur. fr/ Tuber cuList).
Enzymatic assay of NR. The NR activity test was performed with actively growing cultures, which were inoculated directly into phosphate buffer supplemented with nitrate and incubated for 2 h at 37 °C. The mycobacteria were cultured on 7H10 agar supplemented with 0.2% glycerol and 10% albumin/dextrose/catalase (ADC). One L5 strain and one L6 were inoculated into phosphate buffer supplemented with 10 mM nitrate. Following 2 h of incubation at 37 °C, naphthylamide and sulfanilic acid reagents were added, and the colour was then observed 51 .
Ethics declarations. The permission to take informed consent was formally waived by the Comité de Ética de la Investigación de la Comunidad Autónoma de Aragón (CEICA), Spain, CI.PI18/068. No human tissues were used in the study. Once received the bacterial isolate, it was coded (NSTRAIN). The epidemiological data of the cases were sent by fax and were anonymised keeping only the code given to track the analysis of the clinical characteristics, to follow the Helsinki ethical principles for medical research involving human data. The experiment protocol followed was revised and approved by the CEICA and is in line with the Declaration of Helsinki, as revised in 2013.

Data availability
Sequences data reported in the present study were deposited in GenBank under accession numbers MW987573MW987574 and MW987575.