Phylogenetic and genetic characterization of Treponema pallidum strains from syphilis patients in Japan by whole-genome sequence analysis from global perspectives

Japan has had a substantial increase in syphilis cases since 2013. However, research on the genomic features of the Treponema pallidum subspecies pallidum (TPA) strains from these cases has been limited. Here, we elucidated the genetic variations and relationships between TPA strains in Japan (detected between 2014 and 2018) and other countries by whole-genome sequencing and phylogenetic analyses, including syphilis epidemiological surveillance data and information on patient sexual orientation. Seventeen of the 20 strains in Japan were SS14- and the remaining 3 were Nichols-lineage. Sixteen of the 17 SS14-lineage strains were classified into previously reported Sub-lineage 1B. Sub-lineage 1B strains in Japan have formed distinct sub-clusters of strains from heterosexuals and strains from men who have sex with men. These strains were closely related to reported TPA strains in China, forming an East-Asian cluster. However, those strains in these countries evolved independently after diverging from their most recent common ancestor and expanded their genetic diversity during the time of syphilis outbreak in each country. The genetic difference between the TPA strains in these countries was characterized by single-nucleotide-polymorphism analyses of their penicillin binding protein genes. Taken together, our results elucidated the detailed phylogenetic features and transmission networks of syphilis.

Treponema pallidum subspecies pallidum (TPA) is the etiologic agent of syphilis. An estimated 6 million new cases are diagnosed globally every year, with the majority in low-and middle-income countries 1 . However, a resurgence of syphilis has been reported in developed countries worldwide beginning in the twenty-first century 2 , including a substantial increase in syphilis cases in Japan since 2013 3 .
To date, most molecular studies of TPA have been carried out by PCR amplification of several TPA genetic loci 4,5 . These have identified some epidemiological features related to the subtype and to the prevalence of macrolide resistance in strains in different countries 6 . Although whole-genome sequence (WGS) analyses have provided phylogenetic information on TPA strains in several countries [7][8][9][10][11][12][13][14] , those data have not necessarily been accumulated enough for comprehension of the overall aspects of the TPA circulations in the world.
The annual number of syphilis cases in Japan exceeded 7000 in 2018, which was the highest annual number of syphilis cases in 50 years 15 . However, research on the genomic features of the TPA strains from these cases has not been reported. Therefore, we conducted molecular typing and WGS analyses of TPA strains detected between 2014 and 2018 in Tokyo and Osaka prefectures which accounted for about 40% of all the cases in Japan. Then, we reconstructed the phylogenetic relationships between these TPA genomes and the publicly available genomes in other countries. Moreover, the genetic variations and global relationships of TPA strains, and the  www.nature.com/scientificreports/ in Japan and 11 TPA strains detected in China (Fig. 2a,b). The other SS14-lineage strain detected in Japan (strain 15A019HM, Fig. 2a) was in a sub-lineage that included TPA strains in European countries, although coming from a patient lacking information on nationality and a travel history.

Macrolide resistance and SNP analyses.
With regard to the macrolide resistance mutations in Sublineage 1 genomes, unlike Sub-lineage 1A 11 that contained four strains (UW186B, UW213B, UW228B, and UW254B) detected in the U.S, all of the macrolide-resistant strains in Sub-lineage 1B, the EAC, carried a single A2058G mutation in their 23S rRNA genes (Fig. 2b). In addition, 2 macrolide-sensitive strains were detected from MSM cases in Japan (derived from node V in Fig. 2b). Various amino acid changes in the penicillin binding protein (PBP) in the Treponema genome have been reported 9,11,12 . Especially, Sun et al. discussed on the probable relationships between these amino acid changes among the strains in China and drug pressure of penicillin 9 . Although there has been no evidence for the hypothesis that those amino acid changes have some effect on penicillin resistance, they might be useful as phylogenetic markers. This prompted us to investigate those changes among the strains in Japan. Therefore, based on previous studies 9,11 , we selected four PBP genes (pbp1 (TPANIC_0500), pbp2 (TPANIC_0760), mrcA (TPANIC_0705), and Tp47 (TPANIC_0574)) for this study. The mrcA A506V mutation was a unique feature in the genomes of EAC strains and was in all the genomes of TPA strains detected in Japan and China. However, the mrcA A506V mutation was not found in the genomes of other sub-lineage genomes, except strain CZ27 (Fig. 2a,b). The variants associated with pbp2 (TPANIC_0760) were only found in the genomes of TPA strains in China (Fig. 2a,b), and most (9 of 11) of these strains harbored pbp2 (TPANIC_0760) I415F or I415M mutation. Among the EAC strains detected in Japan, no genetic difference in the four genes (i.e., pbp1, pbp2, mrcA and Tp47) was observed. In addition, among the eight Nichols-lineage strains, only one strain detected in Japan (17A021MM) carried the pbp1 P564L mutation (Fig. 2a). This was the first example of a Nichols-lineage strain that carried this mutation.  A2058G  P564L  R312C  S394R  P44L  T174P  I487L  A506T  A506V  T623A  G708S  M625V  T109P  H112P  Q116R  S121P  A366T  I415F  I415M  D82G  Sexual orientation   23S  rRNA  TP0500  TP0574  TP0705  TP0760   2018  www.nature.com/scientificreports/ strains in China (from 7 men, 2 women, and 2 cases with missing data) 8,9,14 appeared to have increased their genetic diversity after the late 2000s. All but one of the TPA strains detected in Japan (15A011MM in Fig. 2a,b) formed distinct sub-clusters between strains from MSM cases and strains from heterosexual cases. Noticeably, the strain 15A011MM is very closely related to the strain in China, X-4, which was apparently separated from other strains in China (Fig. 2b). The MRCA of the 3 TPA strains from MSM cases was likely to be extant in 2009 (2008-2015, node VII in Fig. 2b), and that of the 12 TPA strains from heterosexual cases was likely to be extant in 2013 (2010-2016, node VIII in Fig. 2b). The genomes of the TPA strains in Japan expanded their genetic diversity mainly after the mid-2010s (Fig. 2b). Figure 3 shows the epidemiological trends of the primary and secondary syphilis cases reported in Japan and China between 1995 and 2016. Considering the epidemiological data and the results of the Bayesian temporal analysis for the EAC genomes, the genomes of the TPA strains in China appeared to have expanded their genetic diversity after about 2005 (Fig. 2b), which corresponded to the time of the second wave of syphilis cases in China (Fig. 3). The increase in the number of syphilis cases in Japan was after 2013 (Fig. 3), which corresponded to the time of increasing genetic diversity in TPA strains in Japan (Fig. 2b).

Discussion
In this report, we have described the genomic and phylogenetic features of TPA strains detected in Japan compared to TPA strains detected in other countries, in particular in China. A significant feature of this study was that our analysis included information on the gender and sexual orientation of the syphilis patients from whom TPA strains had been detected.
The maximum likelihood phylogenetic analysis and the Bayesian temporal analysis of the genomes of global TPA strains in this study found similar results to previous reports in terms of lineages and sub-lineages. In those reports, most of the SS14-lineage strains in American and European countries were classified in lineages SS14Ω-A 10,11 .
The majority of TPA strains analyzed in this study in Japan (16 of 20) were classified in the SS14-lineage and formed an EAC, designated Sub-lineage 1B (in lineage SS14Ω-B) in a previous study 11 , that included strains in China. In addition to these strains, there were 3 Nichols-lineage strains and a strain belonging to another SS14 sub-lineage, previously designated as Sub-lineage 8, which contained strains in the U.S. and European countries 11 . WGS indicated an ongoing concurrent circulation of Nichols-and SS14-lineage strains in Japan, as has also been observed in several American and European countries 10,11 . Recently, four TPE strains from Japan were reported between 2014 and 2018 16 . However, the definitions of subspecies of those strains were based on the sequencing of tp0548 and tp0856 genes, but were not based on WGS analyses 16 . Although we could not detect any TPE strain among the strains that passed our criteria described above and in "Methods", we have to keep monitoring the emergence and spread of TPE strains in Japan.
The EAC was composed of SS14-lineage TPA strains in China and Japan, with most TPA strains in Japan (16/17) being subtype 14d/f, which is the dominant subtype among both heterosexual and MSM cases in Japan 17 . We could not evaluate the difference between subtype 14d/f strains from MSM and heterosexual cases in our previous molecular typing studies 17,18 . The WGS study in this report elucidated the separate phylogeny of the strains from MSM and heterosexuals belonging to this same subtype (Table 1, and Fig. 2b).
These results underlined the importance of having information on gender and sexual orientation in analyzing WGS studies of TPA to comprehend the detailed aspect of the circulations of strains in the respective communities of MSM and heterosexuals.
From the geographical point of view, the experimental results for strains from heterosexual cases collected in Tokyo and Osaka prefectures were mixed, showing that genetically similar strains were circulating among heterosexuals in these two prefectures that have the largest populations in Japan.
Based on the phylogenetic analyses with the strains collected since 2011 to 2018 (Table S1, strains in China and Japan), the MRCA of the EAC strains and of strains in China appeared to emerge in 2006 (node III in Fig. 2b), followed by the MRCA of strains in Japan in 2007 (node V in Fig. 2b). Therefore, the EAC has separated into Chinese and Japanese clusters since the mid 2000s. The estimated time of emergence of the MRCA of the  www.nature.com/scientificreports/ Chinese cluster, determined in this study, was similar to that in a recent report 11 . The genomes of TPA strains in China and Japan then expanded their genetic diversity after the late 2000s and mid-2010s, respectively, which approximately corresponded to the time each country had an increasing number of primary and secondary syphilis cases (Fig. 3). This correspondence may be consistent with the fact that TPA is an obligate human pathogen and its accumulation of SNP sites increases with time. However, for the genomes of TPA strains in China, there was a discrepancy in that there was not an increase in their genetic diversity from the mid-1990s to the early 2000s (Fig. 2b), although the onset of the increase in the number of cases was in the late 1990s (Fig. 3). This fact may be attributed to detection bias owing to a limited number of samples included in this study. The simplest explanation is that genomes of TPA in China during the early stage of the syphilis outbreak had not been collected or analyzed systematically enough. Since contagious pathogens can cross borders, the EAC may be an example of cross-border propagation of TPA strains between Japan and China. Although the cause of the current large syphilis outbreak in Japan may be uncertain, an increase in the number of travelers from China to Japan has been noted. The Japan National Tourism Organization (https ://www.jnto.go.jp/jpn/stati stics /visit or_trend s/index .html) has reported that the number of Chinese travelers to Japan has increased rapidly since 2014, which corresponded with the onset of the recent syphilis outbreak (Fig. 3) mainly among heterosexuals in Japan. However, our results indicated that the MRCA of the TPA strains in the sub-cluster formed by the strains from heterosexuals in Japan was likely to be extant in 2013 (node VIII in Fig. 2b). In addition, based on their phylogeny, the MRCA of TPA strain 15A011MM and the strains from heterosexuals in Japan may have been extant in 2007 (node V in Fig. 2b). Therefore, we consider the hypothesis of a connection between Chinese tourists and the syphilis outbreak in Japan controversial, although our interpretation is based on a limited number of samples and on values with a wide (6-7 year) 95% HPD (Fig. 2b).
The results of this study indicated several features about the genetic variants of some TPA genes. First, macrolide resistance mutations in TPA might be reversible, although there have been no data indicating reversion to the wild-type allele. As all the strains in Japan in Fig. 2b were phylogenetic branches from node V, it seems more likely that macrolide resistance was extant at the time node V formed, rather than that most of the strains in Fig. 2b independently mutated to macrolide resistance during the relatively short time after node V formed. Macrolide resistance mutations have been noted previously to have strong stability 11,19 , but existence of 2 macrolide-sensitive TPA strains in Japan (Fig. 2b) implied that there might have been a reverse mutation because of a fitness cost associated with carrying those mutations. However, a fitness cost-hypothesis contradicts the fact that most of the strains in Japan depicted in Fig. 2b still keep the resistance mutations. So, we could not exclude the possibility that the apparent 'reverse mutations' occurred independent of the putative fitness cost during the diversifications as other general SNPs did and have been kept under the absence of the drug. This scenario might meet the fact that azithromycin is no longer recommended for treatment to syphilis in most of the countries in the world.
Second, all the macrolide-resistant strains in Japan in this study carried the A2058G mutation in the 23S rRNA gene, but not the A2059G mutation. This was in agreement with the results of our previous study of over 100 strains in Japan 17,18 .
Third, in this study, the mrcA A506V mutation, which was considered to be unique to strains in China 9,11 , was also observed in EAC strains in Japan. However, some mutations in the pbp2 gene (TPANIC_0760) were not identified in any of the strains in Japan, while most of the strains (9 of 11) in China harbored at least 1 mutation in this gene. Of the 3 SNPs (i.e., A366T, I415F, and I415M) in the TPANIC_0760 gene, only I415F has been suggested to have a deleterious effect on the protein's structural flexibility or its binding constant for substrate stability 9 . However, the effect of these SNPs on the possible generation of the penicillin-resistance is not known at present, because there has been no documentation of penicillin-resistant TPA strain so far, although penicillin has been used extensively to treat syphilis for more than 70 years.
Apart from that, in this study, the separation between the clusters of TPA strains in China and in Japan shown by the phylogenetic analyses was found, for most of the strains, in the SNP analyses of the TPANIC_0760 gene, although there were 2 exceptional strains in China that had no mutation in this gene (SMUTp_04 and X-4, Fig. 2b). In this context, it is noticeable that the strain X-4 was one of those 2 exceptional strains in China, and was phylogenetically separated from other strains in China (Fig. 2b). This strain was, rather, very closely related to a Japanese strain 15A011MM forming a small sub-cluster which was branch from node VI (Fig. 2b). These lines of information implied that this small sub-cluster might reflect the limited case(s) of direct international spread of TPA between China and Japan by the 'Japanese type' strain(s) (with TPANIC_0705 A506V mutation and without any mutation in TPANIC_0760 gene) in the recent year.
Finally, this study confirmed a Nichols-lineage strain carrying the pbp1 P564L mutation. This SNP has been commonly observed in, and limited to, the genomes of SS-14 strains 9,11 . The study reported here strongly suggested that TPA strains in any lineage could carry mutations in pbp1.
In conclusion, most of the TPA strains in Japan in this study had a close relationship to TPA strains in China, forming the EAC. The MRCA of the EAC is likely to have become extant about 2006. The TPA strains in China and Japan subsequently formed a separate cluster in each country in about 2007. The genomes of the TPA strains in each country then expanded their phylogenetic diversity during the time that country had an increasing number of syphilis cases. In addition, phylogenetic analysis showed that TPA strains from MSM cases in Japan clustered separately from strains from heterosexual cases. These findings, within the context of the recent global resurgence of syphilis, provide a better understanding of the phylogenetic features and transmission networks of syphilis, both domestic and global.

Methods
Clinical samples. Samples were collected from patients with suspected syphilis between 2014 and 2018 in four clinics and one hospital in Tokyo and Osaka prefectures, along with information on each patient's gender and self-reported sexual orientation. These information were collected voluntarily. Other information of the patients was not systematically collected. It was suggested that most of the patients were in the primary or secondary stage, although not necessarily specified. This study was approved by the institutional review board of the National Institute of Infectious Diseases (approval numbers 508 and 705), and all the experiments were performed in accordance with the directions from the review board above. The informed consent was obtained from all participants.
Detection, DNA extraction, and molecular typing of treponemal DNA. Tris-EDTA (TE) buffersuspensions (swab samples from genital, anal, and oral lesions) and DNA extracts (the other samples) prepared by a DNeasy Blood & Tissue Kit (Qiagen, Inc., Valencia, CA) were used as the PCR templates. The TPA polA and tp47 genes-targeted PCR was performed as described previously 17,18 . When at least one amplicon of the two genes was generated, a sample was considered to contain TPA-DNA. Molecular typing according to the ECDCT system 4 and determining the presence of the A2058G and/or A2059G mutations in one or both of the TPA 23S rRNA genes were performed by short-read mapping as described below. For the Japanese strains which showed less than genome × 50 coverage in 23S rRNA genes (Table S1), Sanger sequencing was performed and no discrepancies between the two methods were confirmed. The genomes of 20 of the 49 TPA strains met our criteria of at least 90% coverage of the Nichols genome and a minimum of 10 reads. In addition, published treponemal genomes that met our criteria (Table S1) were included in our analyses. These short-reads were mapped to the Nichols genome using bwasw function of bwa v.0.7.17 20 and SNPs were identified using samtools v.1.9 21 and VarScan v.2.4.3 22 as previously described 23 . The default parameters were used for SNP identification except that the minimum coverage was set to ten. Repeat and recombinogenic regions were detected by MUMmer v.3.2259 24 and gubbins 25 , respectively, and were removed from further analyses. Phylogenetic relationships were determined by reconstructing a phylogenetic tree by the maximum likelihood method using IQ-TREE with 1000 ultrafast bootstrap replicates 26 . For temporal analyses, the Bayesian Evolutionary Analysis Sampling Tree (BEAST) v. 2.6.1 program 27 was used. Whole genome sequences of all the TPA strains detected in Japan in this study and TPA genomes in the database with no or limited passages (Table S1) were used for this analysis as previously described 11 . Briefly, a recombination-masked SNP alignment was analyzed using a GTR4 substitution model and a strict clock model with a burnin of 10 million cycles followed by 100 million MCMC cycles. The starting molecular clock rate was set to 3.6 × 10 -4 as described previously 11 . Maximum likelihood and BEAST trees were plotted using a ggtree and a ggplot package of R v. 3.6.2 28 . To confirm the temporal signal in our tree, we used TempEst v.1.5.3 for root-totip analysis and the TIPDATINGBEAST package 29 in R for date-randomization test as previously described 11 . Twenty new datasets with randomly assigned dates were analyzed by BEAST and the results showed no evidence of temporal signal in these replicates, suggesting that the signal in our tree was not found by chance.
Macrolide resistance SNPs were determined by mapping the short reads to the Nichols genome. For identification of non-synonymous variants of penicillin binding protein (PBP) genes pbp1 (TPANIC_0500), pbp2 (TPANIC_0760), mrcA (TPANIC_0705) and Tp47 (TPANIC_0574), the short-reads were assembled using the SPAdes v. 3.13.1 30 . The PBP gene sequences were extracted from the draft genomes. In the cases of those variants could not be determined from draft genomes, the variants were determined by PCR-based Sanger sequencing of the initial DNA samples with the primers shown in Table S2.
Epidemiological data on syphilis cases in Japan and China. Information on syphilis cases in Japan between 2001 and 2016 was obtained from public annual reports provided by the National Institute of Infectious Diseases website 31 . Information on syphilis cases in China between 1995 and 2016 was obtained from a published article 32 . Using these data, we determined the number of primary and secondary syphilis cases, which generally indicate more recent infections and can be a more timely indication of the ongoing spread of syphilis at that time than latent syphilis or asymptomatic cases.

Data availability
Accession numbers Short read data have been deposited in GenBank under accession number DRA009733.