Phylogenomic Perspective on a Unique Mycobacterium bovis Clade Dominating Bovine Tuberculosis Infections among Cattle and Buffalos in Northern Brazil

Lack of routine surveillance in countries endemic for bovine tuberculosis (TB) and limited laboratory support contributes to the inability to differentiate the Mycobacterium tuberculosis Complex species, leading to an underestimated burden of the disease. Here, Whole-Genome Sequencing of Mycobacterium bovis isolated from tissues with TB-like lesions obtained from cattle and buffalos at Marajó Island, Brazil, demonstrates that recent transmission of M. bovis is ongoing at distinct sites. Moreover, the M. bovis epidemiology in this setting is herein found to be dominated by an endemic and unique clade composed of strains evolved from a common ancestor that are now genetically differentiated from other M. bovis clades. Additionally, envisioning a rapid strain differentiation and tracing across multiple settings, 28 globally validated strain-specific SNPs were identified, three of which considered as robust markers for the M. bovis Marajó strain. In conclusion, this study contributes with data regarding the identification of a novel M. bovis phylogenetic clade responsible for ongoing transmission events in both cattle and buffalo species in Brazil, provides a framework to investigate the dissemination of this highly prevalent strain and, holds the potential to inform TB control strategies that may help to prevent the spread of bovine and zoonotic TB.

: the predominant spoligotype was SB0822/SIT997, detected in 21 isolates, which is characterized by the absence of spacers 3,4,9,16, and 39 to 43; the remaining isolate corresponded to the spoligotype SB0885/SIT986, which lacks spacers 3, 4, 5, 6, 9, 16 and 39 to 43. Comparing both profiles, one can speculate that the SB0822/SIT997 profile is the parental strain with the isolate belonging to SB0855/SIT986 type being a derived strain that underwent diversification at the DR locus by losing the 5 th and 6 th spacer regions as this is the only difference between these isolates and is coherent with the unidirectional evolution observed at the DR locus 12 .
To further support this notion, and contrary to spoligotype diversity, 24-loci MIRU-VNTR of 20 M. bovis isolates recovered from 17 animals (two isolates/animals were excluded as these failed to amplify more than 10 MIRU-VNTR loci after multiple attempts) revealed a more diverse scenario composed of five profiles. Only VNTR loci 2165 (ETRA), 2461 (ETRB), 1644 (MIRU 16), and 2347 (Mtub29) presented allelic diversity amongst the isolates with a total of three clusters herein detected encompassing a total of 18 isolates.
The most common MIRU-VNTR profile was shared by 14 isolates, the second and third most common MIRU-VNTR profiles were shared by two isolates and, two isolates exhibited unique profiles and were therefore classified as non-clustered strains (Table 1). Noteworthy, under the 24-loci MIRU-VNTR typing method, the single SB0855/SIT986 clinical isolate was clustered in the largest MIRU-VNTR cluster, mostly comprised by SB0822/ SIT997 isolates (Table 1 and Fig. 1).
The latter lends support to the previous notion that SB0855/SIT986 is a likely divergent strain from the predominant SB0822/SIT997 strains detected in the Marajo Island (Fig. 2). Overall, both MIRU-VNTR and spoligotyping suggest that a highly clonal M. bovis population structure exists in the Marajó Island albeit slightly more diverse under a MIRU-VNTR perspective as would already be expected given its superior discriminatory power 14,15 . Genomic diversity of M. bovis in the Marajó island reveals the Marajó M. bovis strain as a unique monophyletic branch within M. bovis. Out of the 22 M. bovis isolates sequenced, 17 genomes were retained for downstream genome-wide phylogenetic analysis after genome quality control. These isolates originated from the four municipalities: Soure (n = 9), Chaves (n = 6), Cachoeira do Arari (n = 1) and Santa Cruz do Arari (n = 1) ( Table 1).
Search for drug resistance-conferring mutations using TB-Profiler 16 showed that all isolates were genotypically susceptible to all anti-TB drugs except PZA due to the M. bovis specific H57D missense mutation in the pncA gene of M. bovis 17 .
Upon phylogenetic analysis, the Marajó Island M. bovis isolates were found to comprise a monophyletic clade composed of isolates within a maximum pairwise distance of 64 SNPs and is herein referred as the "Marajó-strain" (Fig. 1). These findings are also corroborated by the analysis of the frequency distribution of pairwise SNP distances between isolates from the Marajó Island (0-64 SNPs) which, depending on the isolate are 445-488 and 995-1039 SNPs apart from M. bovis AF2122/97 and M. caprae EPDC01, respectively (Fig. 2). Also, three genomic clusters (C1-3) harboring isolates within a maximum pairwise distance of 5 SNPs were detected with each cluster being composed of isolates originating from the same city within the Marajó Island and isolated from the same animal species. Moreover, the topology of the reconstructed phylogenetic tree is suggestive of historical dissemination by this strain across both cattle and buffalo species and, at multiple geographical locations in the Marajó Island ( Fig. 1).
Next, in order to provide an adequate evolutionary background for this strain, we compared the SNP pairwise distance between the isolates of our study and a global M. bovis dataset comprised of 3 402 M. bovis genomes publicly available, along with two genomes obtained from ANSES (Maisons-Alfort, France), isolated in France and belonging to the SB0822 and SB0855 types (Supplementary Table S1; Fig. 2). This global dataset showed an ample and continuous distribution of pairwise SNP distances, congruent with the M. bovis global phylogenetic representation of this dataset (Fig. 2). Across these, only eight genomes were found to be within a maximum pairwise distance of 150 SNPs from the Marajó strains. To further investigate the phylogenetic context of the the Marajó strains, a total of 240 M. bovis isolates, representative of 150 SB types (along with 18 spoligotyping patterns not found in the Mycobacterium bovis Spoligotyping Database) and originating from over 15 countries, were selected for maximum-likelihood phylogenetic analysis (Fig. 3). This sample of 240 M. bovis genomes included the SB0822 and SB0885 strains isolated in France with the same spoligotype pattern as the Marajó isolates and the M. bovis reference AF2122/97. The phylogenetic tree obtained confirmed the monophyletic nature of the Marajó M. bovis isolates in a global context, and, albeit unclassified as per the current rules and genetic markers associated with the different clonal complexes, these strains did form a parallel branch to the European 2 clonal complex 18 and therefore shared a more recent common ancestor with this specific clade when compared with the other clonal complexes. The French strains, albeit sharing the same spoligotyping profiles, were positioned to distinct M. bovis evolutionary branches and were separated from the Marajó strains by 413-492 SNPs. Despite the structural similarity at the DR locus, the phylogenetic positioning and topological structure of the phylogenetic tree denotes genotypic convergence at the DR locus level by distinct and unrelated clades: the Marajó strain in Brazil and the A11 (SB0822) and C8 (SB0885) French isolates. Minimum spanning trees constructed using the goeBURST algorithm also position the Marajó strains as a separate branch close to strains from Germany and among unclassified isolates regarding the M. bovis clonal complexes (Supplementary Figures S1 and S2).

Isolate
Year Infection site Animal City www.nature.com/scientificreports www.nature.com/scientificreports/ the Marajó M. bovis strain comprises a genetically differentiated family. Given the monophyletic nature of the Marajó strains in a global evolutionary context, we next sough to investigate the genetic differentiation of these strains when compared with the remaining 240 M. bovis isolates by using a total of 11 544 core SNPs as variable genetic loci. Principal Component Analysis (PCA) across three main components shows www.nature.com/scientificreports www.nature.com/scientificreports/ a clear genetic distinctiveness between the Marajó strains and the remaining strains that were herein included for comparative purposes. The PCA also showed that all Marajó strains cluster together across the three main axes and appear to be positioned more closely to isolates from Germany and the United States of America (USA) which, is also compatible with the topology of the phylogenetic tree (Fig. 4). Also, Principal Coordinate Analysis (PCoA) encompassing country of isolation and clonal complex as population descriptives, also do demonstrate that the Marajó strains and the Marajó Island epidemiology is genetically differentiated from other geographical sites and genetic groupings (Fig. 4). This latter notion was also corroborated by pairwise F ST distances calculated between the Marajó strains and the previously described clonal complexes (European 1-2 and African 1-2) 18-21 which convey a notion of a higher level of genetic differentiation of the Marajó strain when compared with other sub-populations of M. bovis (Table 2). In fact, all populations were significantly differentiated although unclassified strains did show a lower level of genetic differentiation, probably owing to its polyphyletic nature ( Fig. 5 and Table 2).
Defining a specific set of SNPs to trace and screen for the Marajó M. bovis strain. In order to facilitate rapid strain screening across publicly available genome data and to enable the implementation of laboratory fast tracking of this strain across different settings, we defined a set of specific SNPs that can be used as a variant signature set for this specific clade. Using ancestral reconstruction methods, we compared the most recent common ancestor (MRCA) of the Marajó clade with ERR2815574, herein used as the outgroup isolate (Fig. 3). Using this approach, we detected 28 SNPs occurring between the two MRCA nodes of the phylogenetic tree.
Ideal clade-specific SNPs were defined as synonymous variants occurring in essential genes, which simultaneously decreases the likelihood of homoplasy driven by convergent evolution while ensuring that the region associated with the variant is conserved. Eight out of the 28 SNPs were found to be intragenic and synonymous and, of these, 3 were found to occur at genes know to be essential in M. tuberculosis H37Rv 22 . To validate this specific set, the 3 402 M. bovis publicly available genomes were re-screened for the initially identified SNPs. All 28 SNPs were found to occur only among the M. bovis isolates from Marajó Island with SNPs 2718745 T, 2830430 C and 3858304 C (M. bovis AF2122/97) considered as robust clade-specific SNPs for the Marajó strain (Table 3).

Discussion
Livestock production is of the utmost economic importance in Brazil and is the mainstay of most inhabitants in the Marajó Island. The latter is the largest fluviomarine island in the world and its barriers to gene flow are highly relevant in the understanding of microbial biodiversity in socioeconomically relevant diseases, such as TB, and of the adaptation of local strains to specific ecological niches 23 . Over recent years, several aspects of the Marajó Island in the State of Pará has motivated special concern regarding bTB: the region is isolated from the mainland and notification of human TB by M. bovis is inexistent along with a very close contact of humans and animals (buffalo and cattle).
Two unusual spoligotypes (SB0822/SIT997 and SB0855/SIT986) of rare occurrence in Brazil were the only two profiles found among the 18 animals whose isolates were analyzed. This prompts us to a scenario of high strain endemicity and suggestive of a low, if any, strain flow from outside of this insular system. Moreover, 24-loci www.nature.com/scientificreports www.nature.com/scientificreports/ MIRU-VNTR typing yielded five profiles and three clusters were detected, yielding a clustering rate of 90%, and was unable to discriminate the single SB0855/SIT986 strain from most SB0822/SIT997 isolates. This preliminary data suggested that the single SB0855/SIT986 strain descends from the SB0822/SIT997 isolates and it also www.nature.com/scientificreports www.nature.com/scientificreports/ suggests that bTB epidemiology in this setting is dominated by a single strain undergoing clonal expansion across different herds of cattle and buffalos.
The SB0855/SIT986 profile is an uncommon one but it has been previously detected in European countries, such as France, Germany, Spain and Belgium. On the other hand, the SB0822/SIT986 profile is not as uncommon and, according to SITVIT2 and Mycobacterium bovis Spoligotyping Database (Mbovis.org), it shows widespread distribution across Europe, South America and North Africa with increased incidence in France. This latter profile has been in fact detected in Southeast Brazil (Minas Gerais and Espírito Santo states) but at a low prevalence when compared with other spoligotyping profiles 24 .
A possible link to France can be speculated since it has been proposed that bTB in South America has been introduced via cattle importation from Europe 25,26 . However, an alternative origin is also plausible since in the end of the 19 th century a ship taking buffalos from India to the French Guyana sank near the Marajó coast with www.nature.com/scientificreports www.nature.com/scientificreports/ the surviving animals becoming well adapted to the island 27 . Before this event, no records of water buffalos exist in this setting.
A more resolved phylogenetic scenario was obtained using WGS of 17 isolates from animals. Upon examining the overall distribution of pairwise SNP distances, and considering the dominance of a single spoligotype profile, we have obtained a broad SNP distance distribution indicating many missing links across the transmission chains and a large timeframe for M. bovis evolution in the area.
The genomic findings do highlight the limitations of 24-loci MIRU-VNTR in this specific setting where this highly prevalent SB0822/SIT997 strain already underwent significant diversification on a genome-wide scale. A total of three genomic clusters were identified, two of which encompassing three isolates and the remaining genomic cluster being composed of two clinical isolates for a total of eight clustered isolates (47%). Each cluster was found to be restricted to the same city and animal species which, per se, suggests that transmission is occurring at multiple geographical points with unapparent dissemination to other animal herds. While the genomic clusters herein detected are restricted to animals of the same species (two involving cattle and one involving buffalos) the tree topology is consistent with an evolutionary history marked by cross-species strain dissemination at multiple points in time since a specific sub-lineage to cattle or buffalos is not observed. Though the present study failed to detect recent transmission between different species, we hypothesize that these transmission events may well occur but at a lower rate since the phylogenetic structure of the clade does support it. Moreover, comparison with a global genomic dataset confirmed the uniqueness and genomic distinctiveness of the M. bovis Marajó strain herein described. Although the Marajó strains comprise a monophyletic clade, these strains are not classifiable in a known clonal complex but do in fact form a parallel branch with the European 2 strains while showing a high genetic differentiation from these. M. bovis European 2 strains are more prevalent in the Iberian Peninsula, are present at low frequencies in France and Italy, and are absent from the British Isles 18 . The phylogenetic history and multivariate populational genetics analysis show that isolates detected in Germany and USA are genetically closer, arguing mostly in favor of the emergence of this strain due to cattle importation from Europe with later dissemination across Buffalo herds.    www.nature.com/scientificreports www.nature.com/scientificreports/ Although the samples included in the study originated from the only abattoir with veterinary inspection in the island and the animals included were sourced from multiple sites, a limitation of this study is its sample size which may have hampered the detection of rare inter-species transmission events. This limitation might be related with the fact that only samples from animals displaying anatomopathological abnormalities or lesions suggestive of bTB were included in the study. A study conducted in South Brazil (Santa Catarina State) revealed that in M. bovis cattle positive for tuberculin/PPD antigen, only 8% of the animals showed clinical signs 28 .
To enable a rapid molecular screening and tracing of this unusual strain across multiple settings, we sought to identify strain-specific SNPs that can inform rapid molecular assays. The genomic SNP distance between the Marajó isolates and other isolates bearing the same spoligotyping profile but isolated in France revealed that spoligotyping is not an adequate marker since the strains from France and Marajó were phylogenetically located on distinct branches of the M. bovis phylogenetic tree, more than 413 SNPs apart. Given the clonal population structure of the MTBC species, SNP markers pose as attractive candidates for strain differentiation 29 . We were able to identify 28 potential SNPs that can be used as a barcode for these strains. Moreover, a total of three SNPs comprised synonymous intragenic SNPs located in genes classified as essential in M. tuberculosis H37Rv by saturated Himar1 transposon libraries 22 , which decreases the probability that the associated regions are lost due to further genome downsizing. The latter is a major mode of genome diversification throughout the evolutionary history of the MTBC 30 .
This set of specific SNPs identified in this study therefore has the potential to be incorporated in molecular screening strategies aimed at detecting the Marajó M. bovis strain at a global level using in silico approaches or, at the state or national level in M. bovis strain biobanks to further evaluate the dissemination of this clade to other regions in Brazil and to assess the risk to human health. This type of approach, using allele-specific PCR amplification has been successfully used to evaluate the dissemination of specific strains outside its endemic region, understand the transmission dynamics at the cross-border level or enable the evaluation of TB transmission in settings without routine or universal molecular typing 31 .
Cultural aspects associated with living conditions in the Marajó such as drinking raw milk or eating dairy products produced from raw unpasteurized milk pose a specific threat to human health. So far, M. bovis infection in humans has not been reported or detected in the island population or in the state of Pará, however, this could be due to a combination of a low detection rate, inexistent laboratory confirmation of clinically suspected cases or identification at the species-level and, lack of routine strain typing at the state-level.
In conclusion, this study combines classical typing methods along with genome-wide sequence data in the identification and delineation of a unique M. bovis strain, that is herein designated as the Marajó strain. As the real magnitude and impact of bTB as a zoonotic disease in northern Brazil remains unknown and there is a considerable and underestimated bTB risk to humans in the Marajó Island, the study provides information on isolates from the livestock sectors in Brazil and on the origin of M. bovis strains. Such information is essential in the development and implementation of future bTB control strategies for the north of Brazil and, the molecular markers identified will aid in the development of rapid molecular assays that can be deployed at multiple settings and assess the specific risk posed by this strain to human health and food safety.

Materials and Methods
Data and sample collection. The present study includes a total of 20 M. bovis isolates from 18 different animals slaughtered at the Soure municipal abattoir, Marajó Island, in North of Brazil. These isolates ensued from the screening of 48 cattle and buffalo, slaughtered for meat production purpose, presenting TB-like lesions, from October 2014 to December 2015. Briefly, 13 isolates were obtained from nine cattle and the remaining nine isolates from buffalos. Different points of origin for these animals were detected within the island: six animals from Chaves, 10 from Soure, one from Santa Cruz do Arari and one from Cachoeira do Arari (Table 1). The samples were obtained during official post-mortem routine inspection performed by the veterinarian technical manager of the municipal abattoir and this study was submitted for ethical review by the Animal Use Ethics Committee of the State University of Pará, which issued an opinion waiver certificate in accordance with Brazilian Law 11794 (October 08, 2008).

Isolation and identification of mycobacterium bovis.
Tissue sample was macerated and decontaminated with SDS(3%) and NaOH(1%), neutralized with bromothymol blue solution and spread evenly onto Löwenstein-Jensen media slants supplemented with pyruvate or with pyruvate and p-nitrobenzoic acid. Screening for the presence of acid-fast bacilli was performed by Ziehl-Neelsen staining and microscopy. Isolates displaying a slow-growth rate incapable of growing on media supplemented with p-nitrobenzoic acid were presumptively identified as MTBC isolates. Molecular confirmation was carried out by PCR amplification and partial sequencing of the hsp65 gene as previously described 32 . DNA extraction was carried out using a phenol-chloroform extraction method 33 . Spoligotyping. Spoligotyping was performed as previously proposed by Kamerbeek et al. 34 and implemented on a Luminex 200 ™ platform automated system (Luminex Corp, Austin, TX) using polystyrene microbeads 35 .

Mycobacterial interspersed repetitive units-variable number of tandem repeat (MIRU-VNTR)
typing. PCR amplification of DNA for MIRU-VNTR typing was performed for a set of 24 tandem repeat loci as previously described 36 . PCR reactions were performed in using GoTaq ® DNA Polymerase following manufacturer's instructions (Promega Corporation, Wisconsin, USA). Amplicon sizing and allelic determination was www.nature.com/scientificreports www.nature.com/scientificreports/ performed by agarose gel electrophoresis on a 2% (w/v) agarose gel. A MIRU-VNTR cluster was defined as a group of two or more isolates sharing identical profiles.
Whole genome sequencing. DNA quantification was performed using the Qubit ™ dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, USA) and the Agilent High Sensitivity DNA Kit (Agilent, California, USA). WGS of M. bovis isolates was carried out on a NextSeq instrument (Illumina, San Diego, CA) using a 2 × 150 paired-end chemistry and the Nextera XT library preparation kit (Illumina, San Diego, CA). Phylogenomic analysis. Quality trimming and filtering of Illumina reads were performed using the Trimmomatic v0.36 by cutting reads whose average quality falls below an average PHRED score of 20 in a 4 bp sliding window and retaining reads with minimum length of 36 bp 37 . Filtered reads were mapped against the M. bovis AF2122/97 genome (GenBank Accession NC_002945.4) 38 using the Burrows Wheeler Aligner tool (BWA-MEM algorithm) 39 . Mapping statistics were obtained using Qualimap 40 . SAMtools and GATK were used for variant calling and only concordant variants were retained for downstream analysis 41,42 .
High-quality SNPs were extracted and concatenated into a single DNA pseudo-molecule. A minimum site coverage of 20 reads were used and variant sites retained only if the majority nucleotide reached a relative coverage depth of 75%. A missing call was assigned if a base call did not met the previous criteria and samples or SNP sites having an excess of 10% missing calls were excluded as to remove heterogenic and low coverage sites 43 . SNP positions falling in mycobacterial PPE/PE genes and repeat regions were removed from the final alignment. The final dataset across 17 isolates and M. bovis AF2122/97 (reference) encompassed a total of 1 773 high-quality SNPs, 1 559 (87.9%) of which had no missing genotypes (core SNP sites). Pairwise SNP distance was evaluated using snp-dists (https://github.com/tseemann/snp-dists).
A maximum-likelihood phylogenetic tree rooted on Mycobacterium caprae EPDC01 44 was created with SeaView v.4 software implementing PhyML using a Generalized Time Reversible (GTR) model without gamma variation and invariable sites 45 . Tree topology was optimized by searching the tree space using the Subtree Pruning and Regrafting (SPR) and, the Nearest Neighborhood Interchange (NNI) methods. Branch reliability was estimated using the approximate Likelihood Ratio Test (aLRT) 46 .
The resulting tree was annotated and rooted using the Interactive Tree of Life v5.3 (iTOL) online tool (available at https://itol.embl.de/) 47 . Ancestral sequence reconstruction was carried out using the package phangorn in R. The goeBURST/Phyloviz tool (available at https://online2.phyloviz.net) was used to identify genomic transmission clusters using a 5 SNP threshold and to generate minimum spanning trees.
Additionally, variants associated with drug resistance were detected using TB-Profiler v2.6.1 (https://github. com/jodyphelan/TBProfiler) 16,48 . comparison with a global M. bovis dataset. Genomic variant data was compared to a global collection of 3 402 M. bovis genomes publicly available on the European Nucleotide Archive (ENA; Supplementary Table S1) until December 2018 (initial: 3 590, 188 isolates excluded due to low coverage of sequence data or inability to generate a spoligotyping profile using SpoTyping). This genome collection is part of a M. bovis genomic variant dataset available at iMed.ULisboa and, is composed of variant call data and individual site mapping statistics obtained through the Snippy mapping pipeline using M. bovis AF2122/97 (GenBank Accession NC_002945.4) as reference genome 49 . Two additional M. bovis genomes from ANSES (Agence Nationale de Sécurité Sanitaire de l'Alimentation, de l'Environnement et du Travail), Maisons-Alfort, France, sharing similar spoligotyping profiles were included (ENA study accession SRP161870). SpoTyping was used to determine in silico spoligotyping profiles for all isolates 50 . If available, metadata on the year of isolation and country of origin was extracted from sample associated XML files available at ENA. Genome-wide high-quality SNPs (Total: 42 419 segregating sites) were extracted from VCF files and coverage-validated using the same parameters described above across the entire genomic dataset, including the Marajó isolates. A phylogenetic tree was constructed for a final dataset composed of 257 M. bovis isolates which included the 17 isolates recovered at the Marajó Island, the two additional M. bovis isolates from ANSES and 237 M. bovis genomes of isolates from more than 16 countries, representative of all spoligotypes (SB type, Mbovis.org) available across the 3 402 M. bovis isolates whose genome was publicly available. For this final dataset, high-quality SNPs were obtained as described for the 17 Marajó M. bovis isolates totalling 20 103 high-quality SNP sites, of which 11 544 (57.4%) had no missing genotypes (core SNPs).
Population genetics. Populational genetic multivariate analysis was carried out in R using the adegenet package. Principal Component Analysis (PCA) and Principal Coordinate Analysis (PCoA) was carried out for the 257 M. bovis dataset using the 11 544 coreSNPs identified across this sample and along three principal components. Inter-populational comparison and differentiation analysis was done by considering the previously described Clonal Complexes (European 1-2 and African 1-2) and assigning each isolate to one of these populations based on specific genomic markers 18,[20][21][22] . Isolates deemed unclassifiable as per this scheme were assigned to the Unclassified population and isolates recovered at the Marajó Island were, given its monophyletic nature, assigned to the Marajo population. Pairwise F ST distances between populations was computed as a metric of populational paiwise differentiation using R along with the dartR and StAMPP packages implementing the method described by Weir and Cockerham (1984) with 1 000 bootstraps 51 .

Data availability
Raw sequence data has been submitted to the ENA under the study accession ERP116404. Sample information along with run accession numbers for publicly available genomes used in this study can be found at the Supplementary Table S1.