Subspecies-specific sequence detection for differentiation of Mycobacterium abscessus complex

Mycobacterium abscessus complex (MABC) is a taxonomic group of rapidly growing, nontuberculous mycobacteria that are found as etiologic agents of various types of infections. They are considered as emerging human pathogens. MABC consists of 3 subspecies—M. abscessus subsp. bolletti, M. abscessus subsp. massiliense and M. abscessus subsp. abscessus. Here we present a novel method for subspecies differentiation of M. abscessus named Subspecies-Specific Sequence Detection (SSSD). This method is based on the presence of signature sequences present within the genomes of each subspecies of MABC. We tested this method against a virtual database of 1505 genome sequences of MABC. Further, we detected signature sequences of MABC in 45 microbiological samples through DNA hybridization. SSSD showed high levels of sensitivity and specificity for differentiation of subspecies of MABC, comparable to those obtained by rpoB sequence typing.

Here we present a novel method for subspecies differentiation of M. abscessus named SubSpecies-Specific Sequence Detection (SSSD). This method is based on the presence of signature sequences within the genomes of all MABC subspecies. The method was validated against a virtual database of 1505 MABC genome sequences from across the world, and it was highly effective for subspecies-level discrimination. Furthermore, the method showed to be valid for discrimination of subspecies of MABC in laboratory conditions through DNA hybridization. Its differentiation accuracy in terms of sensitivity and specificity was comparable to that obtained with rpoB sequencing. DNA sequencing, as a technique, is rarely available for diagnostic laboratories settled outside the most developed countries of Western Europe and the United States. SSSDs, as a method, forms a solid base for further development with various molecular biology techniques and fills the gap for less developed countries or laboratories with lower resources. The principal of SSSD is the detection of specific coding sequences. These sequences, or products of these genes, can be detected by several techniques, including the simplest PCR, DNA hybridization, or immunodetection. They may also be detected by more sophisticated methods like MALDI-TOF or mass spectrometry, and DNA sequencing. We expect that SSSD will facilitate the differentiation of subspecies of M. abscessus in laboratories where specialized equipment is not readily available.  (Table S2). The ANI values for the most closely related species of mycobacteria, Mycobacterium chelonae, Mycobacterium porcinum, Mycobacterium farcinogenes, Mycobacterium fortuitum, and Mycobacterium immunogenum ranged between 74 and 88%. Based on combined MLST and ANI score differentiation, we identified 63% (n = 941) strains in the database as M. abscessus. subsp. abscessus, 30% (n = 454) as M. abscessus subsp. massiliense and 7% (n = 110) as M. abscessus. abscessus subsp. bolletii (Fig. 1). We used ANI identification as the gold standard, which allowed us to estimate the sensitivity and specificity of the SSSDs and other methods used for subspecies identification.

Differentiation of genome sequences included in the virtual database.
Upon full-length rpoB sequence typing of genomes included in the virtual MABC genome database, we observed 97 distinct rpoB alleles. There were 213 variable sites within this gene, with a total of 220 mutations. The average number of nucleotide differences was 38.118. All rpoB gene sequevars, including those of three reference strains of MABC subspecies, grouped into three branches on the phylogenetic tree (Fig. 2). We found 939 strains of M. abscessus subsp. abscessus, 456 strains of M. abscessus subsp. massiliense, and 110 strains of M. abscessus subsp. bolletii. Both sensitivity and specificity of rpoB typing for differentiation of MABC subspecies were high, exceeding 98% (Table 1). Overall, rpoB typing enabled accurate identification of 99.1% (n = 1491) of MABC strains.
For SSSD differentiation, our custom virtual database of MABC genome sequences was BLAST-searched to identify sequences specific for each of the MABC subspecies. We typed 1489 (98.9%) out of 1505 genomes at the subspecies level. We obtained ambiguous results for 16 genomes. These genomes represented four MLST patterns. For two MLST patterns, ST109 type (n = 1), and new MLST pattern (DF) (n = 13), the ambiguities resulted from simultaneous detection of bol-s-s and mas1-s-s or mas2-s-s. gANI analysis identified these strains as M. abscessus subsp. bolletii. One strain, new MLST pattern (BD), was recognized by mas1-s-s and abs-s-s, and one Excluding strains with ambiguous results, a sequence-specific for M. abscessus subsp. abscessus (abs-s-s) was found in 910 (60.5%) genomes, whereas sequence specific for M. abscessus subsp. bolletii (bol-s-s)-in 96 genomes. Sequences mas1-s-s and mas2-s-s were detected in 431 and 467 genomes, respectively. We concurrently detected both sequences specific for M. abscessus subsp. massiliense in 399 strains. Either of the two sequences were found in 483 strains of M. abscessus subsp. massiliense.
Validation of SSSD on clinical isolates. The SSSD method was validated using 45 strains of MABC collected in three countries-Canada, South Korea, and Slovenia and three closely related species of M. chelonae (two strains), M. fortuitum (one strain), and M. porcinum (one strain). The strains were typed using methods described in the Materials and Methods section of this manuscript. The probes targeting MABC-s-s and abs-s-s were 100% sensitive and specific (Fig. 3A,B, respectively ). The probes bol-s-s and mas1-s-s detected specific sequences in all respective strains and one strain of M. chelonae (Fig. 3C,D, respectively). Probe hybridizing to mas2-s-s detected 11 out of 13 strains (84%) of M. abscessus subsp. massiliense (Fig. 3E). Overall, the method of SSSD correctly typed species and subspecies of all clinical isolates included in this study (Table 2).

Discussion
The population of strains included within the virtual database reflects the population of infecting strains, with the highest prevalence of strains of M. abscessus subsp. abscessus, followed by M. abscessus subsp. massiliense and M. abscessus subsp. bolletii. Among 1505 genome sequences, we identified 131 MLST patterns that were previously unreported, which shows that there is still a significant level of variability to be discovered within the genus of M. abscessus.  The SSSD developed in this study reached satisfactory levels of sensitivity and specificity upon in silico, genome-based analysis, and validation procedure on DNA from 45 clinical isolates. When we validated SSSD on clinical strains, we found that the probes detecting bol-s-s and mas1-s-s detected specific sequences in all respective strains and one strain of M. chelonae. We suspect a possibility of horizontal gene transfer between M. abscessus and M. chelonae. The sequences that we detected in this study are present in only one out of three subspecies of M. abscessus complex. We suspect that these sequences were not present in the ancestral strain of M. abscessus complex, and ancestral strains of each subspecies might have acquired them from other bacteria during the process of phylogenetic differentiation. Due to possible cross-reactivity of SSSDs in other species, we recommend the use of M. abscessus specific sequence to identify species of the bacteria, together with subspecies specific sequences. Relying only on the detection of subspecies specific sequences only may result in erroneous identification.
We found that the results obtained by SSSD were not utterly concordant with the results obtained with gANI. We made the same observation for other methods of strain differentiation, rpoB sequencing, and erm(41) typing. We suspect that it is a result of the horizontal gene transfer between bacteria within the complex. This phenomenon was suggested as an important factor contributing to the evolution of M. abscessus 26,27 . M. abscessus is considered a pathogen with open, non-conservative pangenome, which means it should be monitored closely for the acquisition of pathogenic traits 28 . Tan et al. found that recombination MABC more than mutations and natural selection. After a comparison of 243 genomes of M. abscessus, they found that the recombined sequences in the three subspecies have come from different intra-species and inter-species origins 27 . The observations of Tan et al. reflected in our study. There are reports that recombination occurs more frequently in M. abscessus subsp. massiliense than in the other two subspecies 27 , which can explain why two subspecies specific sequences are necessary to discriminate M. abscessus subsp efficiently. massiliense from other members of the complex. Further, for all typing methods, sequences that are, in general, specific for one subspecies, have been identified in bacteria, ultimately belonging to distinct subspecies.  www.nature.com/scientificreports/ The genetic distinction of the subspecies within the complex, suggests the existence of reproductive isolation barriers between bacteria. The extent of these barriers, their mechanism of action, both within and outside the complex, remain to be understood. Importantly, during this research, we observed cross-hybridization of the sequences that detected genes of subspecies of MABC (bol-s-s and mas1-s-s) in a clinical strain of M. chelonae during DNA hybridization. Perhaps, the reproductive barriers are permeable not only between the subspecies of MABC but also between closely related species of mycobacteria. The existence of horizontal gene transfer is a potential limitation of the accuracy of the method that we present.
One of the most significant advantages of SSSD is that the method relies on the identification of a stretch of a particular coding DNA sequence. Therefore, this method is adaptable to a simple, time-and cost-efficient method, which can be an alternative for sequencing-based methods of differentiation in laboratory conditions. In this study, we used DNA hybridization to show that SSSD is efficient in discriminating subspecies of MABC in laboratory conditions. The protocol involved growing bacteria on agar plates, DNA isolation, and DNA hybridization. The entire experiment took approximately a week. However, this method is amenable to more time-and cost-efficient protocols with other molecular biology techniques, for example, microarrays 29 or line probe assays based on reverse hybridization 30 . Subspecies specific sequences can easily be detected using the simplest PCR. Importantly, as showed by the results of this study, the method of SSSD can be exploited for quick and efficient analysis of data obtained by whole-genome sequencing. The bioinformatic methods based on the comparison of entire genomes, like gANI, still require large amounts of computer power, which is neither practical nor accessible for many laboratories. In turn, the BLAST search of the sequence of interest takes seconds. Finally, the use of coding sequences for differentiation allows the possibility to identify M. abscessus and its subspecies in antigen-based cassette tests. The principal of the method fills the gap for M. abscessus identification to the subspecies level, where DNA sequencing is not available.

Materials and methods
Construction of virtual database. Genome sequences of 1505 M. abscessus strains were downloaded from Genome Database at the NCBI. Information on the geographic origin of the strains was retrieved from the Genome Database (Table S1). The strains included in the database were isolated in 14 countries from across the globe (Fig. S1, Table S1). All genome sequences consisted of less than 145 contigs in order to ensure good quality of extracted sequences. The cut-off for the number of contigs, for each strain, was linked to the N50 value, typically used to describe the quality of draft assembly. The N50 value is defined as the shortest sequence length at 50% of the genome. The lowest probable N50 value in our dataset was 35 kbp, with an average N50 of 384.13 ± 18.06 kbp.
Virtual database strain differentiation. The sequences were processed and analyzed with Geneious R11 (Biomatters, Auckland, New Zealand) 31 . We compiled genomic sequences of M. abscessus strains into a custom database. We searched the database with the BLAST tool implemented in Geneious R11 (Megablast, max e-value 1e−4) for the presence of the rpoB, erm(41) gene sequences and another seven gene sequences (argH, cya, glpK, gnd, murC, pta , and purH.) used in the MLST typing. Reference sequences for BLAST search were those of strains NC_010397 (reference strain M. abscessus ATCC 19977) for M. abscessus subsp. abscessus, NZ_ CP014950 for M. abscessus subsp. bolletii, and NC_018150 for M. abscessus subsp. massiliense. Sequences found through BLAST search were aligned with Geneious R11. The strains were differentiated to the subspecies level based on rpoB and erm(41) sequence typing and MLST typing, as previously described 32 . For rpoB sequences, a phylogenetic tree was built with RAxML implemented within Geneious R11. Sequence variability was analyzed with DnaSP 33 . For erm(41), strains were assigned to individual subspecies upon analysis of length of the sequences found during BLAST search, as erm(41) sequence typing differentiates M. abscessus subs. massiliense from other members of the complex based on the absence of the C-terminal end of the gene. For SSSD, strains were categorized into individual subspecies, based on the presence of particular sequences of a given strain (Table 3). These sequences were identified on a hit list after the BLAST search of the custom virtual M. abscessus database. The reference sequence used to identify M. abscessus subsp. abscessus (M. abscessus subsp. abscessusspecific sequence, abs-s-s) was a gene sequence encoding hypothetical protein MAB_3505c. For identification of M. abscessus subsp. bolletii we used sequence of aac1 gene encoding ADP/ATP carrier protein (bol-s-s), located between homologs of MAB_1240 and MAB_1241c. For identification of M. abscessus subsp. massiliense we used two sequences. The first sequence (mas1-s-s) was annotated as a steroid dehydrogenase and was located between homologs of MAB_1115 and MAB_1116. The second sequence (mas2-s-s) was annotated as tetR gene and located within homolog of MAB_2150c.  (Table S1). In order to designate the subspecies for each MLST pattern, strains showing distinct MLST patterns were chosen for subspecies delineation by genome average nucleotide identity estimation (gANI) 11,35 . gANI is a similarity index between a given pair of genomes. Briefly, 181 genomes of M. abscessus showing distinct MLST patterns were annotated with DFAST 36 . Predicted coding sequences were used for calculating pairwise gANI with ANIcalculator 37 (Fig. S2, Table S2). Data processing and visualization were performed with Python 3 scripts using pandas and seaborn libraries. Additional ANI analysis was performed for 47 genomes of MABC, where subspecies delineation results were not concordant for all methods (Fig. S3, Table S3). In the case of discrepancies, the delineation of species by ANI was considered as correct.
Bacterial strains. All strains were isolated from pulmonary patients from hospitals in South Korea, Canada, and Slovenia (Table S4) DNA hybridization. Bacteria were grown on nutrient broth agar plates for 3-5 days at 37 °C. Cells were mechanically disrupted by bead beating, and genomic DNA was isolated with DNAzol (Invitrogen, Carlsbad, USA). Next, 3 μg of each DNA sample, denatured by heating, was loaded onto Hybond-N + membrane (GE Healthcare Life Sciences, Amersham, UK) and crosslinked with UV Stratalinker 1800 (Stratagene, San Diego, USA). DNA probes for DNA hybridization were amplified with Taq polymerase (Sigma-Aldrich, Missouri, USA) ( Table 4). PCR products were purified using the QIAEXII Gel Extraction Kit (Qiagen, Hilden, Germany) (Figs. S4-S7). The specific sequence for the detection of MABC was described previously 38 . For the detection of MABC, we used a probe for gene encoding uncharacterized protein MAB_0974. For SSSD of subspecies of MABC, we used probes hybridizing to abs-s-s encoding hypothetical protein MAB_3505c, bol-s-s, located between homologs of MAB_1240 and MAB_1241c, mas1-s-s annotated as a steroid dehydrogenase and located between homologs of MAB_1115 and MAB_1116, and mas2-s-s annotated as tetR gene and located within homolog of MAB_2150c. Probes were amplified with DNA of strains M. abscessus subsp. abscessus A10, M. abscessus subsp. bolletii B5, and M. abscessus subsp. massiliense M9 and included in this study. Probes were labeled with AlkPhos Direct Labelling and Detection System (GE Healthcare Life Sciences, Amersham, UK). Various temperatures and salt concentrations in buffers were used during DNA hybridization in order to achieve high stringency of the blot. For detection of MABC and M. abscessus subsp. abscessus, membranes with sample DNA were hybridized with labeled probes at 85 °C overnight in hybridization buffer (AlkPhos Direct Labelling and Detection System protocol, GE Healthcare Life Sciences, Amersham, UK). Next, membranes were washed in primary and secondary wash solutions according to the manufacturer's instructions. Afterward, membranes were incubated with detection reagent (CDP-Star, GE Healthcare Life Sciences, Amersham, UK) and gently agitated for 5 min. The membranes were then placed in the cassette with a radiography film (CL-XPosure Film, ThermoFisherScientific, Massachusetts, USA) for one hour and developed with Medical X-ray Processor (Kodak, Rochester, USA). For detection of M. abscessus subsp. bolletii and M. abscessus subsp. massiliense (probes mas1 and mas2) hybridization was performed at 75 °C. Also, the NaCl concentration in hybridization buffer, primary, and secondary washing solution was lowered to 50 mM and 15 mM, respectively.

Data availability
All data generated or analysed during this study are included in this published article (and its Supplementary Information files).