A Major Mycobacterium tuberculosis outbreak caused by one specific genotype in a low-incidence country: Exploring gene profile virulence explanations

Denmark, a tuberculosis low burden country, still experiences significant active Mycobacterium tuberculosis (Mtb) transmission, especially with one specific genotype named Cluster 2/1112–15 (C2), the most prevalent lineage in Scandinavia. In addition to environmental factors, antibiotic resistance, and human genetics, there is increasing evidence that Mtb strain variation plays a role for the outcome of infection and disease. In this study, we explore the reasons for the success of the C2 genotype by analysing strain specific polymorphisms identified through whole genome sequencing of all C2 isolates identified in Denmark between 1992 and 2014 (n = 952), and the demographic distribution of C2. Of 234 non-synonymous (NS) monomorphic SNPs found in C2 in comparison with Mtb reference strain H37Rv, 23 were in genes previously reported to be involved in Mtb virulence. Of these 23 SNPs, three were specific for C2 including a NS mutation in a gene associated with hyper-virulence. We show that the genotype is readily transmitted to different ethnicities and is also found outside Denmark. Our data suggest that strain specific virulence factor variations are important for the success of the C2 genotype. These factors, likely in combination with poor TB control, seem to be the main drivers of C2 success.

SCIeNTIFIC REpoRTS | (2018) 8:11869 | DOI: 10.1038/s41598-018-30363- 3 We have previously characterized the C2 outbreak using whole genome sequencing (WGS) on a sparse time-series consisting of 115 isolates (five from each of the years 1992-2014) and shown that it was a clonal outbreak belonging to MTBC lineage 4.8, with 2 discernible phylogenetic clades, a major and a minor, and a most common recent ancestor dating back to 1959 (95% CI 1944(95% CI -1973, pointing to its introduction into Denmark sometime after the Second World War 11 . The closest known related strains were found to originate in Russia, but were separated from C2 by at least 200 years. Therefore, although the exact journey of C2 into Denmark remains elusive, our initial analysis raises the question, whether the current success of the C2 outbreak is attributable to a unique genetic virulence profile acquired prior to its introduction to Denmark. In order to investigate the potential biological backgrounds for the success of this Mtb strain, we extend our WGS analysis to all available C2 isolates identified between 1992 and 2014 to pinpoint all universally preserved mutations to allow a detailed analysis of all C2-specific polymorphisms. As the definition of virulence is still widely discussed, we use the terms virulence and success interchangeably.

C2 in Denmark.
We extended our WGS to include all C2 Mtb isolates identified in the Kingdom of Denmark from 1992 to 2014 through either Mycobacterial Interspersed Repetitive Units-Variable Number of Tandem Repeats (MIRU-VNTR) or Restriction Fragment Length Polymorphism (RFLP), respectively. Initially, this comprised 989 isolates, but this figure was later reduced to 952 isolates from 892 patients, due to 8 strains having first been misidentified as Cluster-2 or 1112-15, and 28 strains being excluded due to lack of growth (n = 12) or insufficient sequence coverage (n = 18). The C2 strains were found in patients with different nationalities and ethnicities (630 Danish-born (DB), 217 Greenlandic-born (GB) and 45 foreign-born (FB) (from Africa, Middle East, Asia and Europe, own data)). All Mtb strains had been susceptibility tested for the four standard drugs identifying only one strain resistant to isoniazid and another strain resistant to pyrazinamide. The median coverage of the 952 strains was 39.9× [IQR: 28.5-57.6].
Genomic analysis of C2. Analysing the set of 952 isolates, we identified 1309 high quality SNPs, out of which 414 (Supplementary Table S1) were determined to be present in all C2 strains against the H37Rv background, hereafter referred to as monomorphic SNPs. The remaining 895 mutations arose during the C2 outbreak out of which 81 SNPs (9%) occurred in 5 or more strains. Of the 414 identified monomorphic SNPs, 234 (57%) were non-synonymous (NS), 133 (32%) were synonymous, and 47 (11%) were intergenic, corresponding to an overall dN/dS ratio of 0.64.
Of the 234 NS monomorphic SNPs, 58 were found in genes involved in cell wall and cell processes, 52 in genes involved in intermediary metabolism and respiration, 51 in conserved hypotheticals, 32 in genes involved in lipid metabolism, 16 in genes involved in information pathways, 11 in genes involved in virulence, detoxification or adaptation, 11 in genes involved in regulatory proteins, 2 in genes involved in insertion sequences and phages and 1 in a gene with unknown function. We found SNPs in 1-13% of the total numbers of genes in the different categories according to Tuberculist (Table 1). It is important to note that genes encoding enzymes important for cell wall and cell processes and intermediary metabolism and respiration are present with a high number of genes in Mtb 12 , we did however, find more SNPs in this category than expected (Table 1).
Universally conserved single-nucleotide polymorphisms in C2. Out of the 414 SNPs (Supplementary   Table S1), 244 (59%) were universally conserved among reference strains from a global collection of MTBC strains 13 , meaning that these differences likely stem from mutations arising in the reference strain H37Rv, since the two MTBC lineages 4.8 and 4.9 diverged. Furthermore, 24 SNPs were previously identified as universally conserved among strains belonging to MTBC lineage 4.8 14 and 89 SNPs were also found to be conserved among the five strains from Samara, Russia (SAM5), previously identified as being more closely related to C2 than all other global strains from MTBC lineage 4.8 (Fig. 1A) 11 . Thus, 57 SNPs (14%) were uniquely conserved among C2 (26 synonymous, 26 non-synonymous and 5 intergenic). Accordingly, the dN/dS ratio decreased significantly, from 0.64 to 0.36, as more and more distantly related strains were removed from the analysis, resulting in much stronger purifying selection (Fig. 1B). Potential virulence factors. The NS monomorphic SNPs were used to investigate for potential virulence factors. This was done by searching the literature for association between any of the genes in which we found NS SNPs and virulence. Of the 234 NS monomorphic SNPs, 23 were found in genes previously described to be involved in Mtb virulence ( Table 2). Out of the 57 SNPs conserved in C2, we found three non-synonymous mutations in genes previously associated with virulence.

Discussion
In this study, we analysed a major cluster of Mtb isolates (C2) associated with a TB outbreak with significant ongoing Mtb transmission for universally conserved genetic traits. The outbreak was initially confined to the capital city Copenhagen, predominantly in the inner city among socially marginalized persons, but subsequently transmitted all around the Danish kingdom, including Greenland, and to neighbouring countries. Clinical TB is influenced by variability in the host's genetic background, immune status, diet, social, and environmental factors 15,16 but little is known about the bacterial factors, especially genetic diversity in bacterial virulence factors that contribute to variable host responses. The lifetime risk of developing active TB, when infected latently with Mtb is around 12% 17 . The reasons why some are more prone to develop TB, has been discussed widely, and a number of risk factors, such as HIV infection and immunosuppression, social factors, incarceration, or being a drug abuser, have been described 18 . Human genetic variation has also been suggested to play a role in success of TB 3,19 . The success of C2, however, does not seem to be strongly influenced by human genetic factors, as it is found in Denmark among different nationalities and ethnicities as diverse as ethnic Danes and ethnic Greenlanders.
It is likely that the main contributor to the success of the C2 strain is a lack of TB control and social problems, as is reported from other settings 8 . However, in Greenland, a total of 80 different MIRU-VNTR genotypes have been observed between 1992 and 2014. Of these, only 30 clusters with at least one other strain, and only 13 of these clusters have been seen in more than 10 patients, 3 of which are more abundant than C2. The most frequent of these, was found primarily in a remote setting in East Greenland 20 , and was therefore excluded from this comparison. In 2001, C2 was introduced and is spreading successfully in Greenland, a country already fighting an existing heavy TB-burden, suggesting that this particular strain, as well as the GC2 subtype 21 , may have some advantage over the many subtypes introduced in the same period ( Fig. 2A). The majority of the C2 cases from  Fig. 2A). As the vast majority of these cases belong to the same subgroup (Fig. 2B)  a clear indication that C2 in Greenland stems from a single rather than several independent introductions of the strain. The previously reported C2 mutation-rate of 0.24 SNPs/genome/year 11 correlates well with previous findings [24][25][26] and is in fact lower than some other findings 27,28 , indicating that the success of C2 is unlikely to have been caused by hypermutation. It has previously been reported that lineage 4 has a lower mutation-rate than lineage 2 5 , which in several studies reported as the most frequent 20,25,27,29 . Furthermore, as lineage 4 is not as prone to resistance as the Beijing lineage, some other factors must contribute to its worldwide success.
The overall dN/dS ratio of 0.64 also correlates well with previous findings 8, 30 and does not seem to suggest positive selection prior to the introduction of C2 into Denmark. In fact, there was a clear trend towards strong purifying selection (decrease in dN/dS ratio) over the last thousand years (Fig. 1B). This is most likely, as suggested by Pepperell et al., attributable to explosive growth in the human population size over this period 31 .
SNPs were found in all categories of functional genes. Most were found in genes involved in cell wall and cell processes and intermediary metabolism and respiration, which is in accordance with these categories being the most frequent 12 , however, we did find more SNPs than expected. High-throughput sequencing has yielded functional genomic data for many organisms, but a large proportion of the genes are labelled "hypotheticals" or "unknown". For Mtb, it is the case for 27% percent of the genes (1057/3863) 12 and it is speculated if better understanding of these genes might lead to a better understanding of virulence. We found 53 monomorphic NS SNPs in these undescribed genes. Even though it is less than the 68 expected (Table 1), until a better understanding of these genes is obtained, we can only speculate if they have a role in the success.
When searching the literature for virulence linked to certain genes, we could find reports of virulence for 23 of the genes with NS monomorphic SNPs seen in C2. Three of these SNPs, were found exclusively in C2 ( Table 2). Examples include the mazE2 gene, a toxin-antitoxin (TA) gene, reported to help with inducing dormancy and persistence of the bacteria 32 . Another example is the pks1/15 gene, where an intact gene is reported to contribute to virulence by suppressing the human innate immune response 33 . Another 4 of the 23 SNPs reported to be involved in virulence, were found only in C2 and in the closest relative, the SAM5 group 11,34 (Table 2). Among these are the MCE family proteins that play a role in adhesion and invasion of and survival inside macrophages 35 . Furthermore, when cloned into a non-pathogenic strain of E. coli, they gave E. coli the ability to enter and survive in mammalian cells, including macrophages 36,37 .
We also observed a monomorphic SNP in the tcrY gene. Interestingly, an additional mutation in this gene is found among the 81 most abundant SNPs observed within the C2 outbreak. In fact, this mutation is present in 844 out of 864 (98%) of all C2 major lineage strains, and could therefore be a contributing factor to the much higher number of strains in the major-than in the minor lineage (85 strains). Mtb holds 12 two component regulatory systems that enables the bacteria to respond to different external stress indicators, the tcrXY system is one of these and has been suggested to be involved in regulating the genes required for suppressing intracellular growth 38 . Knockout of this gene has resulted in increased virulence in and shorter survival time for SCID mice 39 .
The present study holds a number of limitations. As previously mentioned, a consensus of the definition of virulence has not been obtained and we here use the term interchangeably with "success". In our literature search we looked for studies that link certain genes with virulence, not genes that are in the functional category "virulence" ( Table 2). One approach to experimentally test for virulence, is to measure growth, either in vitro or in vivo 40 . This is outside the scope of this study. Furthermore, it has been suggested that repetitive regions, such as pe-, ppe-, pe_pgrs-genes holds a key to understanding virulence of Mtb 41,42 and in this analysis, these areas have been omitted due to difficulties with sequencing them by the method used. This limitation could be overcome in the future by using long-read sequencing, such as PacBio or Oxford Nanopore MinION.
In conclusion, our data show that the success of C2 cannot be readily attributed to acquisition of antibiotic resistance or demographic factors, such as nationalities and ethnicities. We suggest that bacterial genetic factors (such as polymorphisms in genes related to virulence), likely in combination with poor TB control, are the main contributors to the success of C2. Our identification of C2 specific polymorphisms in genes related to virulence constitute a valuable basis for studying Mtb virulence, and our results facilitate comparative studies as more sequencing data sets from outbreak strains becomes available.

Materials and Methods
Study population. Over the study period, 1992-2014, the incidence of TB in Denmark ranged from 6-12 (7.1 in 2014) per 100,000 [43][44][45]  Initial processing of strains. All culture positive strains at the International Reference Laboratory of Mycobacteriology (IRLM), a biosafety level 3 (BSL-3) certified laboratory, are subjected to testing of antimicrobial resistance and genotyping. Genotypic susceptibility testing was initially done with a Line Probe Assay from Hain (Nehren, Germany), testing for rifampicin (RMP) and isoniazid (INH). Phenotypic drug susceptibility testing was done by sub-culturing in Dubos medium with 0.045% tween 80 (SSI Diagnostika, Hilleroed, Denmark) and incubating at 37 °C. After 2 weeks of incubation, 100 µL of positive culture media was inoculated on a blood agar plate incubated at 35 °C and was checked for growth of other microorganism after 48 hours and microscopy was performed. If no contaminants were found, 500 µL were transferred to a MGIT tube, and after 2-3 days, when positive, the bacterial concentrations in liquid media were adjusted to equal densities at 580 nm by adding Dubos-Tween. One mL of positive broth was diluted in 4 mL of sterile saline and 0.5 mL was used for each drug Subtyping was done with RFLP as described elsewhere 46 and MIRU-VNTR DNA extraction was performed directly from stock and MIRU-VNTR typing performed as described by Supply et al. 47 with a commercial kit (Genoscreen, Lille, France) and processed with a 48-capillary ABI 3730 DNA Analyzer (Applied Biosystems, CA, USA). The MIRU-VNTR allele assignation was performed using GeneMapper software (Applied Biosystems, CA, USA) or BioNumerics (Applied Maths, Sint-Martens-Latem, Belgium).
WGS. DNA isolations on the first 200 isolates were done as previously described 48 and subsequently, in order to sequence the remaining almost 800 isolates more quickly, as described by Votintseva et al. 49 . In brief, 1 mL of a culture enriched in MGIT was centrifuged at 13,000 RPM for 10 min., the supernatant removed and pellet resuspended in 400 µL water, heat inactivated at 95 °C for 15 min and sonicated 15 min at 65 °C. The supernatant was mixed with 1/10 th volume 3 M sodium acetate and 2 volumes of ice-cold 96% EtOH, vortexed and incubated at −20 °C for 1 h. After centrifugation, the pellet was washed with 70% EtOH and subsequently air-dried and re-suspended in 50 µl Tris-EDTA (TE) buffer by heating. Supernatant was transferred to a plate and cleaned with AMPure XP beads according to protocol.
Library preparation and variant calling was performed as previously described 11 . High quality single nucleotide polymorphism (SNP) positions were retained if at least one sample had at least four reads coverage in each direction and a SNP frequency of at least 85%. To robustly identify universally conserved and abundant SNPs in C2, we randomly subsampled 500 out of the 952 strains multiple times (n = 10) and only retained positions identified as universally conserved (monomorphic) or abundant (present in >4 samples) more than once. The presence of universally conserved-and abundant SNPs was then verified for all samples using less stringent criteria (minimum read coverage of 3 and a minimum SNP frequency of 70%). The synonymous substitution rate per site to the non-synonymous substitution rate per site (dN/dS ratio) was calculated as previously described 26 .
All the genes in which we found monomorphic non-synonymous (NS) SNPs where used in a literature search, looking for reports of these genes being involved in virulence.
Data availability. The data generated during the current study are available in the EMBL-EBI European Nucleotide Archive (ENA) under study accession PRJEB20214. https://www.ebi.ac.uk/ena/data/view/ PRJEB20214 Ethical considerations. This study was approved by the Danish Data Protection Agency (Jnr. 2012-54-0100). In accordance with Danish law, observational studies performed in Denmark do not need approval from the Medical Ethics Committee or written consent from subjects. All analyses are presented anonymously.