In humans, dental caries is defined as the demineralisation of teeth by the bacterial production of organic acid from fermentable substrates such as monosaccharides, oligosaccharides and processed polysaccharides1,2,3. The dental caries pandemic has been previously discussed from an anthropological and archaeological viewpoints in terms of the consumption of fermentable carbohydrates4,5,6,7. Indeed, the emergence of dental caries positively correlates with the increased consumption of carbohydrates through the main source, grain8,9, following the advent of prehistoric farming. At that time, the degree of caries was comparatively mild, since oral bacteria produced only low amounts of organic acid. However, the widespread use of refined sugar during the industrial revolution due to improvements in food processing techniques, stabilisation of plantation-based sugar cane supplies and the abolition of the Corn Law10 led to a sudden increase in the prevalence of caries.

Thus far, only anthropological factors, like those described above, have been considered in the development of dental caries in humans, while factors affecting the disease-causing bacteria themselves have been largely ignored. This may be based on the firmly held belief that caries-causing bacteria preceded the development of caries. However, presently, there are no studies describing the origin of caries-causing bacteria. Today, Streptococcus mutans is considered the principal causative organisms of dental caries. These bacteria have the ability not only to metabolise large amounts of acid from various sugars but also to firmly adhere to the tooth surface using adhesive glucans produced from refined sugar by glucosyltransferases (GTFs)11. This latter property of caries-causing bacteria is the principal characteristic distinguishing prehistoric caries from modern caries.

Because of their role in the formation of dental biofilms, streptococcal GTF enzymes are thought to be one of the primary virulence factors of Str. mutans responsible for the development of dental caries1,11. Thus, phylogenetic analysis of the origin of GTFs may provide novel insights into the acquisition of cariogenicity by Str. mutans. GTFs (EC:, which are encoded by gtf genes, belong to the glycosyl hydrolase family 70 and catalyse the transfer of d-glucopyranosyl units from sucrose to acceptor molecules. To predict the ancestry of streptococcal GTFs, phylogenetic trees for the putative amino acid sequences of glycosyl hydrolase family 70 proteins were constructed using the neighbour-joining (NJ), minimum evolution (ME) and maximum parsimony (MP) methods. The surrounding sequences of the genes encoding the glycosyl hydrolase family 70 proteins were also analyzed. Our findings suggest that the genus Streptococcus acquired the gtf genes via horizontal gene transfer. Through the acquisition of GTFs, Str. mutans became capable of forming cariogenic dental biofilms. Thus, our data support the idea that the pandemic of dental caries is likely to have been caused by not only anthropological factors but also the evolution of Str. mutans.


Phylogenetic analyses of glycosyl hydrolase family 70 enzymes

To determine the origin of genes contributing to the cariogenic potential of Str. mutans, 3 phylogenetic trees of glycosyl hydrolase 70 family enzymes (Supplementary Table S1) were constructed using the NJ, the ME and the MP methods. The tree generated using the MP method is shown in Figure 1 and the other 2 are presented in supplemental information. The 3 resultant trees showed complete congruence. Streptococcal GTF enzymes exhibit homology with dextransucrases from Leuconostoc and GTFs from Lactobacillus and Lactococcus, suggesting a common ancestry for the genes encoding these enzymes.

Figure 1
figure 1

Phylogenetic analysis of glycosyl hydrolase family 70 enzymes.

A phylogenetic tree of the amino acid sequences of glycosyl hydrolase family 70 enzymes was constructed using the maximum parsimony (MP) method. Genes bracketed with bold lines are adjacent to each other in the bacterial genome. The value on each branch is the estimated confidence limit (expressed as a percentage) for the position of the branches, as determined by bootstrap analysis. Only values exceeding 50% are shown.

Mapping of the genomic location of glycosyl hydrolase family 70 genes

The upstream and downstream regions of gtf genes from 4 species of bacteria are shown in Figure 2A. Transposase sequences were observed in the upstream and/or downstream regions of gtf180 from Lactobacillus reuteri and gtfKg15 from Lactobacillus sakei. The downstream of gtfR from Streptococcus oralis also exhibited sequence homology to streptococcal transposase genes (Fig. 2B). The identity between this region and the transposase genes from Str. pneumoniae, Str. suis, Str. gordonii, and Str. mitis, indicated 80, 80, 78 and 76%, respectively. However, sequences homologous to transposase genes were not detected downstream or upstream of gtfB, gtfC and gtfD in Str. mutans.

Figure 2
figure 2

Schema of sequences surrounding the gtf gene.

(A) Genomic location mapping of the gtf gene. (a) gtf180 from Lcb. reuteri, (b) gtfKg15 from Lcb. sakei, (c) gtfR from Str. oralis, (d) gtfB and gtfC from Str. mutans and (e) gtfD from Str. mutans. (B) Sequence of the transposase-homologous downstream of gtfR. The region enclosed by the broken line in A (c) is shown in detail in B. FR00720602 and AE007317 are NCBI accession number of Str. oralis Uo5 and Str. pneumoniae R6 genome, respectively. The number after the accession number indicates the location in the genome.

Phylogenetic analysis of the catalytic domain in streptococcal GTFs

To infer age associated with ecological events, the mean interpopulational evolutionary diversity in the sequences encoding the catalytic domain of glycosyl hydrolase family 70 genes among Streptococcus, Leuconostoc and Lactbacillus was estimated as 0.043% (S.E.; 0.003). Further, to identify the most evolved enzyme among the streptococcal GTFs, we reconstructed the phylogenetic tree with only the streptococcal GTF catalytic domains and their coding sequences using the NJ method. The catalytic domain of Lcb. reuteri GTFB and its coding sequences were used as a root (Fig. 3A and B). Trees based on both the gene and deduced amino acid sequences indicated that the streptococcal GTF family could be classified into 3 clusters: the water-soluble glucan-synthesising group (WSG), the water-insoluble glucan synthesizing group (WIG) and the intermediate group (INT). The phylogenetic distances based on the catalytic domain DNA sequences are shown in Table 1. The distances between the root and each streptococcal gtf gene group were found to be almost identical. In addition, the distance between the root and gtfB was found to be the same as the distance between the root and gtfD. In contrast, the distance between the root and gtfC was slightly larger in comparison.

Table 1 Mean phylogenetic distances (lower triangle) and standard error (upper triangle) of the catalytic regions among Lcb. reuteri GTFB (LR GTFB), streptococcal GTF groups and 3 gtf genes from Str. mutans
Figure 3
figure 3

Phylogenetic analysis of the functional regions of streptococcal GTF.

Phylogenetic trees based on the gene encoding the catalytic region (A), the deduced amino acid sequence of the catalytic region (B), the gene encoding the first unit of the glucan-binding region constructed of 6 direct repeat units (C) and the deduced amino acid sequence of the first unit of the glucan-binding region were constructed by the neighbor-joining (NJ) method. In A and B, the streptococcal gtf gene family can be classified into 3 clusters: the water-soluble glucan synthesizing group (WSG), the water-insoluble glucan synthesizing group (WIG) and the intermediate group (INT). In D, GTFs from same or closely related species could be divided into the same cluster. The value on each branch is the estimated confidence limit (expressed as a percentage) for the position of the branches, as determined by bootstrap analysis. Only values exceeding 50% are shown. The scale bar ([NJ] distance) represents a 5% difference in nucleotide sequence.

Further analysis of the streptococcal GTF catalytic domains using a BLAST search showed that these regions are homologous to α-amylase, but not to sucrase. In the blastp analysis of the catalytic domain of Str. mutans GTFB, the E-value for the α-amylase catalytic domain was 8.25e-03. This finding suggests that the catalytic regions of streptococcal GTFs are not phylogenetically related to sucrase, despite having the same function.

Phylogenetic analysis of the glucan-binding domain in streptococcal GTFs

Since the glucan-binding domains in streptococcal GTFs are closely associated with cariogenicity, phylogenetic analyses of their coding genes and putative amino acid sequences of the first unit of the glucan-binding domains constructed of 6 direct repeating units were also performed. The phylogenetic trees were constructed using the NJ method and the glucan-binding domains of L. reuteri GTFB were used as a root (Fig. 3C and D). The trees based on both the coding genes and the putative amino acid sequences could not be classified into 3 clusters as shown in Figure 3A and B. Although the periodicity of clustering was not observed in the tree based on the genes, the tree based on the putative amino acid sequence indicated that GTFs from the same or closely related species could be divided into the same cluster. In the SYSTERS Protein Family Database, the glucan-binding regions were categorized as Pfam CW_binding 1 (PF01473), which was a cell wall binding protein family that included glucan-binding protein (Gbp), choline-binding protein, 1,4-β-N-acetylmuramidase and N-acetylmuramoyl-l-alanine amidase (


Phylogenetic analyses of glycosyl hydrolase family 70 enzymes were performed. Glucosyltransferase-S enzymes from Lactococcus lactis, encoded by pspA or pspB12, have the simplest structures and were consequently defined as the roots of the dendrograms. However, these enzymes do not actually belong to the glycosyl hydrolase family 70 enzymes, since they do not possess the catalytic and glucan-binding domains characteristic of this enzyme family shown in the alignment analysis. The constructed trees indicated that the streptococcal GTFs were derived from other lactic acid bacteria following their spread through the genera in the order of Lactococcus, Lactobacillus, Leuconostoc and Streptococcus. More specifically, our data suggests that Lcb. reuteri GTFB or Lcb. reuteri GTFML4 would be the practical ancestor of the streptococcal GTFs. Our previous finding that only 51% of 41 Str. oralis clinical isolates possessed the gtf gene13 supports the idea that some of the Streptococcus spp. are in the process of acquired the gtf gene. The average rate of sequence divergence at synonymous sites as determined by a comparison of homologous protein-coding regions, is 0.90% per million year14. Given that the interpopulational divergence in the sequences encoding the catalytic domain of GTFs is 0.043%, their expansion would have occurred approximately 48,000 years ago. Thus, Streptococcus acquired the gene only recently in bacterial evolutional time.

It may seem likely that glycosyl hydrolase family 70 enzymes were produced by a combination of sucrase and glucan-binding proteins, such as Gbp from Str. mutans. However, in silico analyses using the DNA and protein databases, revealed that streptococcal GTF catalytic regions are homologous to α-amylase, but not to sucrase. These analyses further showed that the glucan-binding regions are categorised as CW_binding 1, including 1,4-β-N-acetylmuramidase and N-acetylmuramoyl-l-alanine amidase. These findings suggest that the common ancestor of the glycosyl hydrolase family 70 enzymes would be an enzyme group whose general function was sugar hydrolysis and sugar chain binding. In addition, glucan-binding proteins, such as Gbp, were found in Str. mutans only in the present database. For this reason, it has been suggested that Gbp, one of the virulence factors allowing adherence to the tooth surface, was first produced due to the lack of the first half of glucosyltransferase in Str. mutans.

Homologues can be acquired through gene duplication (paralogs) or horizontal gene transfer (orthologs)15,16 and evidence exists to support both methods of acquisition in the glycosyl hydrolase family 70 genes. Our study located a transposase encoding gene upstream or downstream of the gtf gene in both Lcb. reuteri and Lcb. sakei17 (Fig. 2A). A sequence highly homologous to streptococcal transposase genes was also found downstream of Str. oralis gtfR18 (Fig. 2A and B). It has been reported that intra- and interspecies genomic variation in Streptococcus spp. is associated with transposase and that the bacterial population is a flexible gene pool19. These findings suggest that the streptococcal gtf gene was acquired through horizontal gene transfer via transposons, while periodic extinguishing of transposable elements may explain their absence around the gtf gene in some species of bacteria20. On the other hand, glycosyl hydrolase family 70 genes are believed to easily duplicate, since many bacteria possess multiple glycosyl hydrolase family 70 genes within the same genome. Tandem-arranged genes such as gtfB and gtfC (Fig. 2) are considered to have replicated by tandem duplication, a phenomenon observed in Str. mutans21, Streptococcus salivarius22 and Streptococcus criceti and in the lactic acid bacterium, Lcb. reuteri (Fig. 1). Based on these findings, Str. mutans may have acquired the 3 gtf genes as orthologs from other bacteria and/or as paralogs by duplication.

The acquisition of a gene is caused as the result of adaptation to habitat23. During this process, it is thought that some parts of the acquired gene are modified to adapt to the species-specific circumstances and that the other parts are conserved to keep the primary function14. For this reason, it is thought that the diversity of the conserved region reflects the systematic evolution of the acquired gene without species-specific modification. The catalytic regions of GTFs are highly homologous among streptococcal species and are highly conserved to keep sucrase activity. Thus, to observe the systematic evolution of streptococcal gtf genes, the phylogenetic analyses of catalytic region were carried out (Fig. 3A and B). Our results show that each phylogenetic distance between the root and the 3 streptococcal gtf gene groups (WIG, WSG and INT) was almost identical, suggesting that these groups differentiated within the same period. This finding is in contrast with the commonly held view by dental researchers that GTFB and GTFC are the most advanced S. mutans GTF enzymes, since they primarily synthesize the water-insoluble α-1,3-linked glucans directly associated with cariogenicity.

Here, we have shown that enzymes in the upper part of the phylogenetic tree synthesise glucans with various linkage types such as α-1,3; α-1,6; α-1,2; and α-1,4, while those in the lower part of the tree synthesise only water-soluble α-1,6-linked glucans (Fig. 1 and Supplementary Table S1). Previous studies have shown that streptococcal GTF produces soluble glucans as a result of the displacement of amino acids in the catalytic domain and/or a decrease in the number of repeating units in the glucan-binding domain24,25. Thus, it was suggested that streptococcal GTF evolved to synthesise water-soluble glucans and that the ability to synthesise water-insoluble glucans would only be acquired if the ancestral character was strongly retained and highly specialised.

As shown in Figure 3A and B and Table 1, both gtfB and gtfC belong to the WIG cluster and adjoin each other in the phylogenetic tree, with only a short distance of 0.144 separating them. gtfC is probably replicated after gtfB, as the phylogenetic distances between the 2 genes and the root are 0.560 and 0.539, respectively. In contrast, gtfB and gtfD are divided into separate clusters and the phylogenetic distance between them is relatively high (0.397). Thus, it is likely that these 2 genes were acquired separately; however, the similar phylogenetic distances between these genes and the root (gtfB, 0.539; gtfD, 0.537) suggest that they were acquired at similar time points.

Character acquisition is an evolutionary response that follows exposure to different selection pressures such as starvation conditions and thermal stress26. The resultant functional divergence plays an important role in the varying environmental adaptations of organisms23,27,28,29. For oral streptococci, it is conceivable that the availability of fermented foodstuffs and encounters with other lactic acid bacteria that possessed glycosyl hydrolase family 70 enzymes were major environmental selection pressures contributing to their acquisition of gtf genes. Non-oral streptococci, such as Str. pneuminiae and pyogenic group streptococci do not possess the gtf gene. At least, the gtf gene does not exist in the genome of those streptococci on the current database. This suggested that those streptococci that did not encounter the lactic acid bacteria carrying the gtf gene could not acquire the gene. We propose that Streptococcus obtained the first gtf gene via gene transfer as an initial step towards cariogenicity. The character acquisition was then completed following the consumption of refined sugar by humans, which acted as a secondary selection pressure prompting the acquisition of multiple gtf genes (Fig. 4). We also propose that this acquisition allowed Str. mutans to develop a niche on the tooth surface, as opposed to other oral regions, such as the saliva, tongue, pharynx, or larynx.

Figure 4
figure 4

Hypothetical model of GTF enzyme acquisition by oral Streptococci.

The genus Streptococcus acquired the gtf gene via horizontal gene transfer when it encountered lactic acid bacteria, such as Lactobacillus and Leuconostoc, in the human oral cavity. This event coincided with the introduction of fermented foods into the human diet, which was brought about by the development of agricultural practices and food preservation methods.

In this study, we suggest that the reasons underlying the emergence of dental caries were not only anthropological but also bacteriological, i.e., genetic events leading to the evolution of cariogenic Str. mutans. We therefore recommend that a molecular biological approach, such as PCR-based detection of the gtf genes30, be adopted for future anthropological and archaeological studies. This would enable a more precise distinction between caries and tooth wear as well as analysis of the historical expansion of Str. mutans.


Glycosyl hydrolase family 70 sequences

The DNA and amino acid sequences of glycosyl hydrolase family 70 proteins used in this study were obtained from GenBank at NCBI ( with cross-reference to Pfam ( Sequencing analyses of unknown streptococcal gtf genes was performed as described previously18,30 and the sequences were deposited in the DNA Data Bank of Japan ( We analysed 20 GTFs from Streptococcus; 2 GTFs, 9 dextran sucrases and 1 alternan sucrase from Leuconostoc; 10 glucan sucrases from Lactobacillus; and 2 GTFs from Lactococcus. The NCBI accession numbers of these GTFs are provided in Figure 1 and Supplementary Table S1.

Homology and protein family search

Motifs from GTFs were analyzed by homology and protein family searches using BLAST in NCBI and SYSTERS Protein Family Database in Max Plank Institute for Molecular Genetics, Computational Molecular Biology (, respectively.

Phylogenetic analysis

Sequence alignment was performed using ClustalX software version 1.8331 ( Multiple alignment files saved by ClustalX in the Clustal format (*.aln) were converted to the MEGA format (*.meg) using the MEGA version 5 software32 ( Phylogenetic analysis was performed by the NJ, ME and MP methods using MEGA version 5 software. Phylogenetic distances were calculated by the NJ method using the same software. The corresponding parameter of the NJ algorithm was set at ‘complete deletion’ and the ‘nucleotide: p-distance’ model and ‘bootstrap method’ were used. The mean interpopulational evolutionary diversity of the sequences encoding the catalytic domain of GTFs among Streptococcus, Leukonostoc and Lactobacillus was also calculated with MEGA 5. The ‘complete delition’, ‘bootstrap method’ and ‘p-distance’ model were used in this analysis.

Mapping of the genomic location of glycosyl hydrolase family 70 genes

The locations of the genes encoding glycosyl hydrolase family 70 proteins from Str. mutans, Lcb. reuteri and Lcb. sakei were obtained using the Annotation Search Tool in the Comprehensive Microbial Resource of J. Craig Venter Institute ( or by referring to previous reports17,33. The upstream and downstream sequences of the gtfR gene from Str. oralis Uo5 genome (GenBank: FR720602.1) were analysed using BLAST (NCBI;