Whole-genome sequencing-based epidemiological analysis of anti-tuberculosis drug resistance genes in Japan in 2007: Application of the Genome Research for Asian Tuberculosis (GReAT) database

We investigated the lineages of Mycobacterium tuberculosis (Mtb) isolates from the RYOKEN study in Japan in 2007 and the usefulness of genotypic drug susceptibility testing (DST) using the Genome Research for Asian Tuberculosis (GReAT) database. In total, 667 isolates were classified into lineage 1 (4.6%), lineage 2 (0.8%), lineage 2/Beijing (72.1%), lineage 3 (0.5%), and lineage 4 (22.0%). The nationality, gender, and age groups associated with the isolates assigned to lineage 1 were significantly different from those associated with other lineages. In particular, isolates of lineage 1.2.1 (EAI2) formed sub-clusters and included a 2,316-bp deletion in the genome. The proportion of the isolates resistant to at least one anti-tuberculosis (TB) drug was 10.8%, as determined by either the genotypic or phenotypic method of DST. However, the sensitivities to isoniazid, streptomycin, and ethambutol determined by the genotypic method were low. Thus, unidentified mutations in the genome responsible for drug resistance were explored, revealing previously unreported mutations in the katG, gid, and embB genes. This is the first nationwide report of whole-genome analysis of TB in Japan.


Materials and Methods
Strain collection and drug susceptibility testing. All Japanese strains were collected in 2007 by Tuberculosis Research Committee (RYOKEN) Japan a nationwide coalition of TB hospitals in Japan that report DR rates once every 5 years 11 . These isolates were collected from August 2007 to July 2008. In total, 329 isolates collected from patients aged >40 years were randomly selected from the registered patients in this study to investigate the proportion of foreign-born TB patients. In addition, 338 isolates collected from patients aged 0-39 years were also included for analysis. WGS of the isolates was performed using an all-in-one web-based tool for genotyping of Mtb, namely, Total Genotyping Solution for TB (TGS-TB). All isolates were subjected to phylogenetic analysis, in silico spoligotyping, and DR prediction 12 . The concordance, specificity, and sensitivity of prediction of drug susceptibility by TGS-TB were calculated by comparing the results of phenotypic drug susceptibility testing (DST), which was used as a gold standard. The phenotypic susceptibility of the isolates to anti-tuberculous drugs [i.e., isoniazid (INH), rifampicin (RFP), streptomycin (SM), ethambutol (EB), and levofloxacin (LVFX)] was calculated using the proportion method 13 . The genotyping results of the isolates were compared with the results of SNV-based genotype analysis on TGS-TB 12 . DNA preparation and WGS of the Mtb isolates. Genomic DNA was extracted using the DNA ISOPLANT Kit (Wako Pure Chemical Industries, Ltd., Osaka, Japan), purified with a QIAquick column (QIAGEN GmbH, Hilden, Germany), and quantified using a Qubit 2.0 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). Paired-end libraries were prepared from 50 ng of purified DNA with the QIAseq FX DNA Library Kit (QIAGEN GmbH, Hilden, Germany) in accordance with the manufacturer's protocol. The average fragment size (500-600 bp) of the DNA libraries was estimated by 2% agarose gel electrophoresis. Then, the fragments were eluted using the Wizard SV Gel and PCR Clean-Up System (Promega Corporation, Madison, WI, USA). The 24 purified DNA libraries were pooled, and the DNA concentration was quantified with a Qubit 2.0 fluorometer. The pooled libraries (11 pM) were sequenced on an Illumina MiSeq system (Illumina, Inc., San Diego, CA, USA) with the MiSeq Reagent Kit ver. 3 (600 cycles), which showed that the first paired-end reads were 350 nt in length, whereas the second paired-end reads were 250 nt in length.
Genomic analysis of the isolates with informatic tools. The reads obtained from sequencing were analysed using TGS-TB 12 , which is a pipeline for conventional epidemiological analysis. Prediction of genetic markers for antimicrobial resistance (e.g., ahpC, embA, embB, embC, embR, ethA, ethR, gid, gyrA, gyrB, inhA, kasA, katG, pncA, rpoB, rpoC, rpsA, rpsL, and rrs) listed in the TB profiler database 14 , lineage analysis (i.e., lineage 1, lineage 2, lineage 2/Beijing, lineage 3, and lineage 4) based on single-nucleotide polymorphisms followed by KvarQ 15 , and in silico spoligotyping were automatically performed based on the sequence data. Core-genome phylogenetic and linkage networks were also analysed using TGS-TB 12 . Prior to in silico genotyping, the adapter sequences were trimmed from the short reads, and low-quality bases with a Phred score of <15 were eliminated 12 using the Skewer program to obtain sequences that were at least 50-mers 16 . The remaining reads were mapped using the BWA-mem program 17 with the reference genome sequence of Mtb strain H 37 Rv (NC_000962.3) 18 . Reliable SNV sites with at least a 5× coverage depth and a Phred score of ≥20 were selected 12 . In this study average coverage depth was 85. Maximum likelihood phylogenetic analysis of all concatenated SNV alleles was performed using RAxML v8.2.0 19 with 1,000 bootstrap iterations. To identify epidemiological linkages among the isolates, data from queries for isolate-specific genes or the abovementioned reference genomes were downloaded as a NEXUS format file to visualize linkage networks, such as by the median-joining method for network visualization using PopART (http://popart.otago.ac.nz). In silico spoligotyping was performed by a search using the Basic Local Alignment Search Tool with 43 spacer sequences 20 .
Statistical analysis. Data are summarized as the mean, median, and/or range, as appropriate, and compared using Fisher's exact test or the chi-square test. All tests were two-sided, and a probability (p) value of < 0.05 was considered statistically significant.

Isolates from Japanese and foreign-born patients. As shown in
Phylogenetic analysis of lineage 1 isolates from the RYOKEN 2007 dataset of the GReAT database. A phylogenetic tree of 31 lineage 1 isolates from the RYOKEN 2007 dataset of the GReAT database generated by SNV-based analysis of the core genome with TGS-TB is shown in Fig. 1. No identical or closely related isolates were observed ( Fig. 1 and Supplemental Table 1). The lineage 1 isolates were classified into three sub-lineage types. Of the 31 isolates, 27 (87.1%) were identified as lineage 1.2.1 [East-African-Indian (EAI)], two as lineage 1.2.2 (EAI1), and two as lineage 1.1. 1.1 (EAI4). The isolates identified as lineage 1.2.2 (EAI1) were from foreign-born patients (Indonesia and Nepal). The isolates belonging to lineage 1.1.1.1 (EAI4) were also from foreign-born patients, one of whom was born in Vietnam. The isolates belonging to lineage 1.2.1 (EIA2) formed two clusters, namely, EAI2 sub-lineage 1 and EAI2 sub-lineage 2. Of the 13 EAI2 sub-lineage 1 isolates, 9 (69.2%) were from foreign-born patients, with most (8/9, 88.9%) from female patients from the Philippines, including three that were resistant to INH (Fig. 1). Furthermore, a 2,316-bp deletion (NC_000962.3, 4056664-4058980) was detected in the genomes of all isolates of sub-cluster EIA2 sub-lineage 1 (Supplemental Fig. 2). Of the 14 EAI2 sub-lineage 2 isolates, 7 (50%) were from foreign-born patients (Thailand or the Philippines) (Fig. 1). Among these EAI2 sub-lineages, there was no significant difference in the results of spoligotyping, which analysed the spacer regions of direct repeats in the genome of Mtb (Supplemental Table 2). A large deletion was observed in the EAI2 sub-lineage 1, which was the largest sub-lineage from the Philippines (data not shown). Taken together, the results show that this large deletion in sub-lineage 2 could be useful for identification of sub-lineage 1.2.1 (EAI2).

Discussion
This study was the first to identify the genetic characteristics of Mtb throughout Japan by WGS. The genomic analysis showed that the largest group of isolates belonged to lineage 2/Beijing, accounting for 485 (72.1%) of the 667 isolates analysed, whereas lineage 4 accounted for 147 (22.0%), and lineage 1 accounted for 31 (4.6%). Epidemiological analysis by WGS of foreign-born TB patients in countries with low TB prevalence has been conducted in the U.S. 21 , Canada 22 , Spain 23 , Italy 24 , and Germany 25 . Although most of these studies found limited evidence of the transmission of TB between foreign-born and native-born patients, the transmission of TB between foreigners and Japanese patients was not observed in the present study. Consistent with the increasing numbers of immigrants from Asian countries with high TB burdens, such as the Philippines, China, Vietnam, Nepal, and Indonesia, to Japan, the number of TB cases among foreign-born patients has also been increasing, going from 842 (3.3%) of 25,311 cases in 2007 to 1,338 (7.6%) of 17,625 cases in 2016. Of the 1,338 foreign-born TB patients in Japan identified in 2016, 318 (23.8%), 272 (20.3%), 212 (15.8%), 135 (10.1%), and 90 (6.7%) were from the Philippines, China, Vietnam, Nepal, and Indonesia, respectively (Tuberculosis in Japan: Annual Report 2017, http://www.jata.or.jp/rit/ekigaku/en/statistics-of-tb/). Therefore, to control TB from foreign-born patients, genetic markers from patients from each of these countries should be identified to predict the invasion and transmutation of TB.
As shown in Table 2, of the 31 isolates identified as lineage 1, 20 (64.5%) were from foreign-born TB patients ( Table 2). The proportion of lineage 1 isolates from foreign-born TB patients in the Philippines was quite high compared with that of other lineages (0% for lineage 2, 2.5% for lineage 2/Beijing, 0% for lineage 3, and 5.4% for lineage 4). Kobayashi et al. reported that in Tokyo, the proportion of lineage 1 isolates was significantly higher in foreign-born patients than in Japanese patients 26 . Therefore, a greater prevalence of lineage 1 isolates would be expected from foreign-born patients. There were significant differences in the prevalence of lineage 1 isolates among groups, with foreign-born females aged 20-39 years being the most frequently infected. In particular, 14 isolates identified as lineage 1 (EAI2) from females from the Philippines aged 20-39 years formed www.nature.com/scientificreports www.nature.com/scientificreports/ a sub-cluster. These isolates were not closely related, as there were 86 to 289 SNVs among them (Supplemental Table 1). However, all 14 of these isolates had a large deletion of approximately 2.3 kb at the same position in the genome (Supplemental Fig. 2). The distribution of TB lineages in the present study was similar to that described www.nature.com/scientificreports www.nature.com/scientificreports/ in other reports using different methods, such as spoligotyping 20,27 . However, in the present study, spoligotyping was not able to distinguish the EA12 sub-lineage (Supplemental Table 2). The large deletion in sub-lineage 1 was also observed in a majority of the isolates from the Philippines in the GReAT database (unpublished data). Therefore, this deletion might be a heritable characteristic of sub-lineage 1 and could thus serve as a candidate genetic marker of the isolates originating from the Philippines.
The proportion of DR to at least one anti-TB drug was 9.6% in 2007 in Japan 11 , comparable to the value of 10.8% observed in the present study (Supplemental Table 4). The proposed method seems promising as a standard method for DST in the future. Indeed, the concordance and specificity of the genotyping method for predicting drug susceptibility was good compared with the method currently used (Table 3). However, the sensitivity differed among drugs (70.8%, 81.3%, 50.0%, and 100% for INH, SM, EB, and RFP, respectively) ( Table 3).
The GReAT database and bacterial culture-based methods can also improve the prediction of DR in Mtb by detecting genomic mutations responsible for DR and reducing the time necessary for DR detection, from weeks and months to within days or hours 28 . To detect the mutation of the rpoB gene responsible for RFP resistance, gene targeting methods, such as line probe assays or the Cepheid Xpert Mtb/RIF assay, can provide DST results within days or hours. However, these methods can detect the most frequent mutations of the rpoB gene but are limited to RFP. With future improvements in sequencing methods, mutations associated with resistance to other anti-TB drugs can be obtained simultaneously by WGS. Bioinformatic tools, such as KvarQ 15 , TB Profiler 14 , PhyResSE 29 , CASTB 30 , Mykrobe Predictor 31 , and TGS-TB 12 , are also useful to predict DR to not only first-line but also second-line anti-TB drugs. However, predictions made with these tools are limited to the reported major mutations in the DR genes. Therefore, other relevant mutations in these genes may be unreported. In the present study, unreported mutations in the katG, gid, and embB genes responsible for resistance to INH and EB were identified (Supplemental Table 3). By updating the list of mutations in DR genes, these findings should contribute to the improvement of the prediction of DR in TB. conclusion WGS of 667 isolates from the RYOKEN 2007 dataset of the GReAT database revealed genomic markers of Mtb isolates from the Philippines (Fig. 1) and previously unknown mutations in genes associated with resistance to INH, SM, and EB (Supplemental Table 3) to improve genomics-based DST. This is the first report of WGS for anti-TB DST in Japan, showing the usefulness of the GReAT database.

Data Availability
Data supporting the findings of this manuscript are available from the corresponding author upon reasonable request. Nucleotide sequence data reported are available in the DDBJ Sequenced Read Archive under the accession numbers DRA008824, DRA008825 and DRA008826.  Table 3. Phenotypic DST, RYOKEN 2007, Japan. * Genotypic DR of the isolates was predicted by TGS-TB, TB Profiler and KvarQ in TGS-TB. † The methods of phenotypic DST of the isolates against isoniazid (INH), rifampicin (RFP), streptomycin (SM), ethambutol (EB), and levofloxacin (LVFX). R; resistant, S; susceptible. The DST method is described in the Materials and Methods section. The numbers of isolates is indicated in each column for anti-TB drugs. The concordance rate, sensitivity, and specificity of the genotypic method vs. phenotypic method are also indicated. The concordance, specificity, and sensitivity of prediction of drug susceptibility by TGS-TB were calculated by comparing the results of phenotypic DST, which was used as the gold standard.