Introduction

Conventional dairy starter bacteria including Streptococcus thermophilus, Lactobacillus delbrueckii subsp. bulgaricus and Lactococcus lactis have a long history of use in the home-made and modern manufacture of fermented dairy foods, i.e., yogurt and cheese1,2. These dairy starters are able to ferment milk lactose to produce lactic acid which decreases the pH to 4.5 ~ 4.7 resulting in the coagulation of milk proteins3,4. Among these important conventional starters, S. thermophilus is a non-pathogenic and homofermentative facultative anaerobe, which is used for the manufacture of yogurt and certain types of cheese. There has been an increasing interest in using a novel EPS-producing S. thermophilus for enhancing functionalities of yogurt and cheeses5,6,7,8,9.

Until April 2014, six strains of S. thermophilus have been fully sequenced and their whole-genome sequence data are released in the NCBI Genome database10,11,12,13,14. Comparative genome analysis of dairy S. thermophilus suggests that their proteolytic activity, nitrogen metabolism, sugar utilization and transporter systems play crucial roles for their adaptation to milk environments7,12,15. In addition to the “generally recognized as safe” status of dairy S. thermophilus through loss-of function events such as decay and loss of virulence determinants during evolution, both lateral gene transfer (LGT) and natural competence contribute to the shaping of S. thermophilus genome. This kind of evolution results in diverse metabolic activities and gives new functionalities to dairy foods10,16. Common features of dairy S. thermophilus include rapid acidification of milk, acid tolerance, bacteriocin synthesis, lactose utilization, production of formic and folic acids, innate and adaptive immunity, bacteriophage resistance and most importantly, exopolysaccharide (EPS) biosynthesis7,15. These features are important for dairy S. thermophilus as starter bacterium for its applications in milk fermentation.

Extracellular polysaccharide, also known as exopolysaccharide (EPS), produced by lactic acid bacteria (LAB) including S. thermophilus is generally regarded as a food-grade as it is naturally produced5,8,9. EPS may be secreted into the medium as ropy EPS, or may be attached to cell surface of the microorganism in the form of capsular EPS8. EPS has been reported to improve the viscosity and texture of yogurt and some cheeses and to prevent syneresis in yogurt5,8,17,18,19,20,21. Moreover, EPS produced by dairy LAB is able to replace chemically modified starches or milk fat in commercial yogurt, especially set-type yogurt, to give considerable rheological effects, mouthfeel and creaminess to fermented milk products5,20,21. Certain EPSs have also been reported to have some important probiotic characteristics such as immunostimulative properties, anti-oxidative effects and anti-microbial activities against pathogens22,23,24.

In general, EPS yield among majority of S. thermophilus strains varies from 20 mg/L to 600 mg/L in milk-based medium under optimal conditions9,25. Among all the reported data of EPS yield from the species of S. thermophilus, S. thermophilus ASCC 1275 (ST 1275) produced the highest known amount of EPS (~1,029 mg/L) in milk medium in presence of 0.5% whey protein concentrate when fermentation was carried out at pH 5.5 and 37°C for 24 h26. Moreover, ST 1275 produced both capsular and ropy EPS20,27. It has been documented that capsular EPS does not cause ropiness in milk products whereas ropy EPS contributes to the enhanced texture of milk products28. Our previous studies have shown that high amount of EPS produced from ST 1275 exhibited texture modifying properties in Mozzarella cheese and yogurt17,18,19,20,21. Additionally, the usage of ST 1275 for milk fermentation contributed to the development of low-fat or fat-free yogurt and Mozzarella cheese17,18,20,29,30,31,32. Thus, any efforts to increase EPS yield in milk would be of great significance for enhancing functionalities of fermented dairy foods.

EPS assembly of repeating unit is determined by eps gene cluster, which has been revealed in detail in certain species of LAB and has shown diverse gene structures so far33,34. Despite the release of eps gene clusters from six sequenced strains of S. thermophilus10,11,12,13,14, their data on EPS yield still remains unknown; this may be due to the commercial nature of these strains or low yield of EPS. Hence, our understanding of high EPS-producing S. thermophilus at genomic level is still limited. Based on our previous studies on high EPS yield from ST 1275 in milk, we used ST 1275 in the current study as a model dairy starter to demonstrate the mechanism of high EPS yield from the species of S. thermophilus at genomic level.

Results

Genome sequencing and assembly

ST 1275 genome was sequenced by one shotgun run and one 8 kb-span paired-end run using a 454 Roche GS Junior System. A total of 72,487,271 bases generated from 158,162 raw shotgun reads and 56,596,072 bases from 152,819 raw paired-end reads were aligned into 65 contigs and 4 scaffolds, resulting in an average sequencing depth of ~62 fold. Draft genome was achieved by de novo assembly to produce a draft genome with 4 scaffolds containing 44 large contigs with an N50 Contig length of 100,486 bp long, indicating that this assembly was highly continuous. Only three gaps were found between the junctions of contigs and were filled in by general PCR and Sanger sequencing method. This de novo shotgun paired-end pyrosequencing is able to provide high sequencing depth for microbial genome.

General features of ST 1275

The complete circular genome of ST 1275, which was a plasmid-free bacterium, was 1,845,495 bp with an average GC content of 39.06% (Fig. 1). A comparison of general features of five sequenced S. thermophilus strains and ST 1275 genome is shown in Table 1. As compared with other sequenced S. thermophilus, ST 1275 possessed the lowest numbers of 5 and 55 of rRNA operon and tRNA, respectively. Moreover, the highest number of four separate CRISPR/Cas loci was found in its genome suggesting that this organism may have better adaptive immunity against various bacteriophage infections.

Table 1 Comparison of general genome features of sequenced S. thermophilus
Figure 1
figure 1

Circular genome map of the S. thermophilus ASCC 1275 chromosome.

The genome of plasmid-free ST 1275 is 1,845,495 with an average GC content of 39.1%. The circular genome has been generated with the CGView Server59. The GeneBank accession number for ST 1275 genome is CP006819.

The result of functional annotations of ST 1275 and other five sequenced S. thermophilus is shown in Fig. 2. In general, no major differences were found in regards to the number of genes in each functional group. Three highest numbers of genes in these six strains were found in the functional groups including those associated with protein and amino acids and with carbohydrate metabolism. This indicates that above three functional groups are closely associated with adaptation of S. thermophilus to milk environment in regards to nutrients such as milk proteins and lactose.

Figure 2
figure 2

Comparison of functional annotation of ST 1275 and other five sequenced S. thermophilus using RAST server.

The nucleotide sequences of six sequenced S. thermophilus were uploaded into the RAST server based on SEED subsystems for functional annotations.

Carbohydrate utilization and sugar transport system

Sugar uptake, transport system and sugar hydrolases in ST 1275 are shown in Supplementary Table 1. Partial sugar metabolism involved in nucleotide sugar biosynthesis is shown in Fig. 3. Our previous studies have demonstrated that this organism was able to metabolize lactose into lactic acid efficiently resulting in rapid acidification of milk (pH 4.5–4.7) within 8 h during milk fermentation26. This is the pH at which coagulation of milk takes place and importantly this is an acceptable fermentation period for industrial processing. In addition to utilizing lactose, galactose and glucose, ST 1275 appears to be able to ferment mannose and fructose (Supplementary Table 1 and Fig. 3). However, sucrose, mannose and fructose are the only three sugars that may be transported by specific phosphoenolpyruvate-dependent phosphotransferase systems (PEP-PTS), while lactose- and glucose-specific PEP-PTS is not available in ST 1275. Since lactose is the main sugar in milk, rapid acidification of milk by this starter is highly dependent on the utilization of lactose during milk fermentation.

Figure 3
figure 3

Nucleotide sugars biosynthesis for EPS production in S. thermophilus ASCC 1275.

The numbers refer to the enzymes involved: 1, β-Galactosidase; 2, Glucokinase; 3, Phosphoglucomutase; 4, UDP-glucose pyrophosphorylase; 5, UDP-glucose 4 epimerase; 6, UDP-galactose 4 epimerase; 7, Galactose 1-phosphate uridylytransferase; 8, Galactose mutarotase; 9, Galactokinase; 10, dTDP-glucose pyrophosphorylase; 11, dTDP-glucose-4, 6-dehydratase; 12, dTDP-4 keto-6 deoxy-glucose 3, 5-epimerase; 13, dTDP-4 keto-L-rhamnose reductase; 14, Fructokinase; 15, 6-phosphofructokinase; 16, Phosphoglucose isomerase; 17, glutamine-fructose-6-phosphhate transaminase; 18, Phosphoglucosamine mutase; 19 & 20, N-acetylglucosamine-1-phosphate uridyltransferase (bifunctional); 21, UDP-galactopyranose mutase.

Unlike limited number of hydrolases for amylose in other sequenced S. thermophilus strains, intact genes including one α-amylase, one glucanhydrolase, three glycogen debranching proteins and two alkaline amylopullulanases were found in ST 1275 genome (Supplementary Table 1). This suggests that this organism may have an efficient amylolytic activity to break down starch35. This may be important for performing fermentation for achieving high cell-density using amylose as a cheap source of carbohydrate.

EPS biosynthesis and comparison of eps gene cluster

All essential components for EPS production including complete nucleotide sugar biosynthesis (Fig. 3) and a novel eps gene cluster for EPS assembly (Fig. 4) were found in ST 1275 genome. This starter contains highly conserved epsA-epsB which was assigned for biosynthesis regulation and eps1C-eps1D for determining the chain length of EPS12,36. epsE gene encodes a membrane-associated priming glycosyltransferase and does not catalyze glycosidic linkage but transfers sugar-1-phosphate to undecaprenyl-phosphate-lipid carrier on the cytoplasmic face of the membrane34,37. Subsequently, epsF, epsG, epsH, epsI, epsJ and epsK encoding glycosyltransferases may transfer various nucleotide sugars including UDP-glucose, UDP-galactose, dTDP-rhamnose, UDP-GlcNAc and UDP-galactofuranose to form the repeating units in a glycosidic linkage-dependent manner34,37. Additionally, a unique UDP-galactopyranose mutase was found in this cluster for the synthesis of UDP-galactofuranose. However, chemical structure and sugar composition of repeating unit remain to be determined. Remarkably, it was for the first time that we found an additional eps2C-eps2D in this cluster, which may also be involved in the chain length determination. The assigned functions of polymerization and translocation of repeating units are achieved by epsL and epsN, respectively. The epsO and epsP together are possibly responsible for the phosphorylation events, while epsQ is assigned for the transfer of EPS between the membrane and peptidoglycan layer. It has been documented that the orf14.9 gene distributed in all eps gene clusters of six strains (Fig. 4) is associated with the cell growth of S. thermophilus38.

Figure 4
figure 4

Comparison of eps gene cluster among S. thermophilus ASCC 1275 and other five sequenced S. thermophilus.

The predicated functions of each color-coded ORF (intact or truncated) are indicated at the lower bottom panel. The size of each ORF in eps gene cluster is indicated in each pentagon (intact) or chevron (truncated).

In general, nucleotide sugar biosynthesis is one of the two factors for EPS yield while the eps gene cluster is another key factor for EPS assembly of repeating unit in lactic acid bacteria (LAB). However, various structure of eps gene cluster has been shown in LAB indicating that the production and chemical structure of EPS is strain-specific33,34. Interestingly, the occurrence of two-pair genes, namely eps1C-eps1D and eps2C-eps2D, for determining the chain length of EPS in ST 1275 genome implies that this starter may produce EPSs of different molecular sizes. This confirms our previous finding that ST 1275 is a producer of both capsular and ropy EPS20,27.

Proteolytic system

Milk is known to be a poor source of carbon and free amino acids, but contains abundance of proteins such as casein. It was found that extracellular proteinase (known as PrtS), membrane transporters and intracellular peptidases contribute to the utilization of exogenous proteins by S. thermophilus in milk7,12. Hence, proteolysis system in ST 1275 plays a crucial role for this organism for its adaptation to milk. For extracellular proteinase, ST 1275 encodes one intact PrtS (T303_05205), which is involved in the cleavage of casein to oligo-peptides and is only found in some strains of S. thermophilus. This is a key component for cell growth in milk39,40,41. Then, oligo-peptides and free amino acids are transported into cells by membrane amino acid/peptide transporters. Remarkably, an abundance of intracellular protease and peptidase were found in ST 1275 (Supplementary Table 2). This helps ST 1275 cells break down oligo-peptides into free amino acids for cellular metabolism or for direct utilization.

Two-component regulatory systems

The two-component regulatory systems (TCRSs) and related loci are shown in Supplementary Table 3. It has been documented that TCRSs are closely associated with stress and adaptive responses, bacteriocin biosynthesis, natural competence and biofilm formation42,43. Seven intact TCRSs were found in ST 1275 (Supplementary Table 3). However, certain functions of TCRS have been poorly characterized in S. thermophilus and most of them have unknown functions or are involved in multiple cellular responses7.

Stress response systems

Acid resistance, cold and heat response, salt resistance and oxidative stress response system for ST 1275 are shown in Supplementary Table 4. These loci presented in ST 1275 genome may play important roles for ST 1275 in adapting cells to stressful conditions, such as presence of oxygen, heat and cold, acid and salt. In addition to the TCRSs in ST 1275, additional stress regulators (T303_00880 and T303_09015) may be involved in the regulation of adaptive cellular responses.

Similar to other sequenced S. thermophilus strains, ST 1275 contains almost same number or types of heat-shock and cold-shock proteins and oxidative stress response-related genes for bacterial fitness or performance.

For acid resistance, a proton translocaing F0F1-ATPase system and a urease system coupled with ammonia permease were found in ST 1275 genome (Supplementary Table 4). These may contribute to internal pH homeostasis in this starter when facing extreme acidic environment, such as acids produced during milk fermentation. However, no loci encoding intact amino acid deiminase and decarboxylase were found in ST 1275 genome; those are also associated with maintenance of internal pH in bacteria44,45. Remarkably, urease system is only found in S. thermophilus among all the species of LAB and has been found to be effective for the control of internal pH homeostasis46.

Interestingly, several salt resistance-related genes were found in ST 1275 genome. Since S. thermophilus is an essential starter for the manufacture of several common types of cheeses, these genes may help ST 1275 cells survive or adapt to high level of salt, especially in cheeses containing high level of salt.

Defense system

The loci encoding bacteriocin biosynthesis, multidrug resistance genes and competence proteins for natural transformation are shown in Supplementary Table 5. Lantibiotic is commonly produced by S. thermophilus as an anti-microbial weapon against other microbes such as food-borne pathogens47. Additionally, several early and late competence genes were found in ST 1275 genome. Interestingly, it has been demonstrated that Ami (oligopeptide transporter), signal peptide and comX (sigma factor) are important for the induction of early competence development in S. thermophilus48,49,50. Natural competence is closely associated with LGT such as acquisition of novel genes in S. thermophilus2,16.

Moreover, several genes for multidrug resistance (Supplementary Table 5) including two β-lactamases were found in its genome. However, genes encoding above enzyme for hydrolyze β-lactam antibiotics is very common in LAB and recognized probiotics such as Lactobacillus rhamnosus GG51. The gene of β-lactamase may be obtained via LGT during its evolution when β-lactam antibiotics were common and largely used in 20th century. Other multidrug ABC transporter system may be useful for removing cytotoxic compounds. Additionally, a mucus-binding protein (T303_03820) was found in ST 1275 genome, which indicates that this organism may have potential as a probiotic organism to colonize and survive in the human gastrointestinal tract, especially in inviduals exposed to β-lactam antibiotics.

CRISPR/Cas system against bacteriophage infection

Four separate CRISPR/Cas loci were predicated in the genome ST 1275 by CRISPR finder online service (Fig. 5). Recently, CRISPR/Cas system as prokaryotic defense system against bacteriophage infections has been documented. There have been several mechanisms against bacteriophage infections in bacteria such as encounter blocks, resistance to viral absorption, penetration blocks, restriction modification and CRISPR/Cas system52. However, CRISPR/Cas system has been widely distributed in prokaryotes as an adaptive immunity against bacteriophage infection. In addition to their innate immunity such as restriction modification system in dairy starters, adaptive immunity is very important for both dairy and starter culture industries to guard against phage infection which causes failure of milk fermentation52,53.

Figure 5
figure 5

Structure of CRISPR/Cas loci in S. thermophilus ASCC 1275.

DR, direct repeat. Four different DRs were black color-coded and spacers were other color-coded. The consensus sequence and size of four DRs are indicated at lower right panel of each locus. The size of each CRISPR-associated protein in the locus is indicated in each pentagon (intact) or chevron (truncated).

Interestingly, ST 1275 contains the highest numbers of CRISPR/Cas loci, possessing four CRISPR loci and 24 CRISPR-associated protein (cas) genes including two truncated cas genes, among all the sequenced strains of S. thermophilus. In general, three CRISPR loci are located at the downstream of cas genes while CRISPR2 is located in the middle of cas genes in CRISPR/cas locus 2. Moreover, four CRISPR loci have three different spacer numbers and four different consensus sequences of direct repeats (DRs). These diverse CRISPR/Cas loci in ST 1275 suggest that it may have a better adaptive immunity against different bacteriophages compared with those in other sequenced S. thermophilus. This is important for industrial manufacturing of dairy products that use this organism. In particular, CRISPR1 locus has the highest numbers of DRs and spacers when compared with other three loci. This suggests a possible effective defense mechanism to integrate novel spacers in CRISPR1 when ST 1275 is exposed to bacteriophages53. It is likely that CRISPR2 locus may have limited contribution to bacteriophage response because of less spacers. It has been demonstrated that increased expression of cas1 and cas2 gene was indicative of higher activity in S. thermophilus LMD-9 during bacteriophage response12. Thus, the distribution of cas1 or cas2 gene in four CRISPR/Cas loci may confer their active roles in defense system.

Discussion

Due to the importance of EPS produced by dairy starter bacterium on the quality of fermented dairy foods, attentions have been paid to novel EPS-producing starters, especially an essential starter Streptococcus thermophilus1,6,7,9. Although several S. thermophilus genomes are available, their EPS yields are not reported, possibly due to their commercial nature or low EPS yield10,11,12,13,14. So far, numerous studies have been carried out for identification and characterization of eps gene clusters in high EPS-producing LAB while no genomic data is available for high EPS-producing starter bacterium, especially for an important organism such as dairy S. thermophilus33,34. To the best of our knowledge, ST 1275 produces highest known amount of EPS (~1,000 g/L) under optimal conditions in milk as compared with other well-documented EPS-producing S. thermophilus strains (Supplementary Table 6), however, its regulatory mechanism for EPS yield remains poorly understood and merits further studies26. Hence, it was interesting to have insight at genomic level of ST 1275 as a model of high EPS-producing starter in the species of S. thermophilus.

In general, S. thermophilus is not able to uptake lactose via lactose-PTS but via lactose/galactose permease (LacS). Then lactose is hydrolyzed into glucose and galactose by β-D-galactosidase and galactose is excreted into the extracellular medium by LacS resulting in high concentration of residual galactose in milk after fermentation6. Hence, less galactose is utilized by S. thermophilus as galactose is mainly metabolized for synthesis of nucleotide sugars for EPS production. However, the EPS yield in S. thermophilus strains is very limited. Residual galactose in cheeses such as Mozzarella cheese leads to browning during baking process of pizza made with such cheeses54. Thus, high EPS-producing S. thermophilus could be an ideal choice to reduce residual galactose in milk and as well as for improving texture of dairy foods.

Since eps gene cluster for EPS assembly of repeating unit and nucleotide sugar biosynthesis are the two factors that have direct influence on EPS yield. Hence, we have paid attention to both of them in ST 1275 genome. Interestingly, the occurrence of eps1C-eps1D and eps2C-eps2D (Fig. 4) assigned for chain length determination indicates that this organism may assemble two types of EPSs of different molecular size. Based on our previous study that ST 1275 is a mixed producer of both capsular and ropy EPS20,27, we conclude that ST 1275 produces at least two types of EPSs. However, further work to determine the chemical structure of EPSs from ST 1275 would be important. Additionally, previous studies have demonstrated that increased gene expressions involved in nucleotide sugar biosynthesis improved the EPS production from LAB including S. thermophilus9,54. However, very limited information is available for the gene expression in eps gene cluster for EPS assembly. Since it is common that there is only one pair epsC-epsD gene for chain length determination in LAB, the occurrence of two pair genes of epsC-epsD indicates a complex regulation of EPS production in ST 1275. In general, EPS production from LAB is cell growth-associated. However, our previous study found that optimization of cultivation conditions such as pH, temperature and addition of whey protein concentrate has resulted in a large increase in EPS yield, while no effect was observed on the cell growth of ST 127526. This implies that gene expressions of nucleotide sugars biosynthesis or EPS assembly were possibly changed in ST 1275 under optimal conditions. Thus, mechanistic study on the regulation of EPS yield from ST 1275 merits further investigation.

Comparisons of common features including carbohydrate utilization, proteolytic system, stress response system and defense system among the sequenced S. thermophilus strains suggest that ST 1275 may serve as a model for high EPS-producing dairy starter bacterium. Specifically, this strain may possess effective proteolytic system, which contributes to adaptation of this organism to milk and rapid acidification of milk. Acid resistance using unique urease system in ST 1275 may improve cell viability in extreme acidic conditions such as in yogurt. Four dependent CRISPR/Cas loci may be effective in controlling phage infection. Abundance of multidrug resistance genes and a mucus-binding protein in the cell surface may allow ST 1275 to serve as a probiotic candidate for survival and colonization in the gut and for improving gut homeostasis. The elucidation of ST 1275 genome makes this organism a model dairy starter bacterium for high EPS yield among the species of S. thermophilus.

Methods

Bacterial strain and culture conditions

S. thermophilus ASCC 1275 (ST 1275), a typical dairy starter bacterium, was obtained from the Australian Starter Culture Research Center (ASCRC; now Dairy Innovation Australia Limited, Werribee, Victoria, Australia). This organism was stored at −80°C in 10% (w/v) reconstituted skim milk containing 20% (v/v) glycerol and was activated by growing anaerobically in M17 agar (BD Company, NJ, USA) at 37°C for 24 h. After successful activation, a typical individual colony was inoculated in M17 broth containing 1% lactose and anaerobically incubated at 37°C for 18 h. Then, cells were harvested for genomic DNA extraction.

Genomic DNA extraction

Genomic DNA was extracted from ST 1275 using the CTAB/NaCl method according to the protocol from DOE Joint Genome Institute (JGI, http://my.jgi.doe.gov/general/protocols.html). Briefly, bacterial cultures were harvested by centrifugation, re-suspended in TE buffer containing lysozyme, SDS and Proteinase K and incubated at 37°C for 1 h, followed by steps including addition of CTAB/NaCl (pre-warmed to 65°C), incubation at 65°C for 10 min and DNA purification using phenol/chloroform/isopropanol (25/24/1, v/v/v). Genomic DNA was precipitated and washed by adding isopropanol and 70% ethanol, respectively. Finally, DNA pellet was dried and resuspended in TE buffer containing 0.1 mg/mL of RNase. Then, the concentration and quality of genomic DNA were measured by Nanodrop-1000 UV/Vis spectrophotometer (NanoDrop Technologies, DE, USA).

De novo shotgun paired-end pyrosequencing and genome assembly

Shotgun sequencing, paired-end pyrosequencing and Sanger sequencing were carried out to generate the whole genome of ST 127555,56. Briefly, shotgun sequencing was performed using 454 GS Junior System (Roche Diagnostics, CT, USA) using a GS FLX titanium rapid library preparation kit according to the manufacturer's instructions (Roche Diagnostics). One extra paired-end pyrosequencing run was carried out by using 8 kb-span library to produce a draft genome. The raw reads were de novo assembled into contigs using Newbler 2.7 (Roche Diagnostics). To complete the whole genome of ST 1275, primers were designed and gaps in the draft genome were filled by sequencing PCR products using ABI 3730 capillary sequencer.

Gene prediction and annotation

Gene annotation was carried out using NCBI Prokaryotic Genome Annotation Pipeline56. Coding sequence (CDS) prediction programs provided by GLIMMER v3.02 was used for gene prediction57. BLASTp was used to align the amino acid sequences against NCBI non-redundant database. Amino acid sequences encoded by predicted genes were searched against all proteins from complete microbial genomes, alignment length over 90% of its own length and over 60% match identity were chosen and the best BLAST hit with highest alignment length percentage and match identity was assigned as the annotation of predicated gene55. Further annotation was obtained using the SEED-based automated annotation system provided by the RAST server58.

Bioinformatic analysis

CRISPR finder, a web online tool (http://crispr.u-psud.fr/Server/), was used for identifying CRISPR/Cas systems in bacteria. Ortholog assignment and metabolic pathway mapping of ST 1275 was executed for the amino acid sequences of CDSs using KEGG Automatic Annotation Server (KAAS; http://www.genome.jp/tools/kaas/), an online service based on bi-directional best hit (BBH) method.