Improper use of antibiotics creates evolutionary pressures that drive bacteria to acquire drug resistance by natural mutation and/or horizontal transfer of resistance genes. This is a major public health threat: it is estimated that drug-resistant infections cause 10 million deaths annually and may result in economic losses reaching 100 trillion US dollars by 20501. However, a target-to-hit screen typically requires ~24 discovery projects and 94 million US dollars, and the baseline total cost is 1.8 billion US dollars over 13 years to launch a new drug2. In fact, the number of new antibiotics developed and approved has steadily decreased in the past three decades, leaving fewer options for treating resistant bacteria3.

Streptococcus pneumoniae is one of the pathogens posing the greatest threat to human health4,5. S. pneumoniae belongs to the mitis group6,7 and is a major cause of pneumonia, sepsis and meningitis8,9. In 2015, pneumococcal pneumonia caused over 1.5 million deaths in individuals of all ages, and this rate increased in people over 70 years old between 2005 and 201510, which is especially problematic since the elderly population is growing in many parts of the world. Although pneumococcal conjugate vaccines have considerable benefits, non-vaccine pneumococcus serotypes have increased worldwide11,12.

Conflict between the host immune system and pathogens leads to evolutionary arms races known as the Red Queen scenario13,14. Protein regions at the host–pathogen interface are subjected to the strongest selective pressure and thus evolve under positive selection. Adaptive evolution has been reported in genes related to the mammalian immune system such as pattern recognition receptors14. Concerning negative/purifying selection, Jordan et al. compared two whole-genome sequences of Escherichia coli and showed that essential bacterial genes appear to demonstrate substantially lower average values of synonymous and nonsynonymous nucleotide substitution rates compared to those in nonessential genes15. However, to our knowledge, comprehensive evolutionary analysis of codons of genes encoding bacterial cell surface proteins has not been performed. Mutations in essential genes directly cause host death because essential genes encode proteins that maintain processes required for basic bacterial survival such as central metabolism, DNA replication, and translation of genes into proteins. Meanwhile, nonessential genes are under considerable negative/purifying selection, which would be important for the survival and/or success of the species in the host and/or the environment as non-synonymous substitution of codons can lead to lineage extinction (Fig. 1). Phylogenetic and molecular evolutionary analyses can reveal the number of codons under negative/purifying selection in a species. Because alterations in amino acid residues in regions under negative selective pressure are not allowed, drugs targeting these regions would be less likely to promote the development of resistance through natural mutation.

Fig. 1
figure 1

Scheme for intra-species molecular evolutionary analysis. a Random genetic drift induces synonymous and non-synonymous mutations with equal probability. However, non-synonymous mutations in essential region are removed by host selection. b As a result of natural selection, synonymous substitutions are concentrated in important genes. Phylogenetic and molecular evolutionary analyses can detect significant accumulation of synonymous substitutions in codons of host proteins. Codon-based analysis yields much more information than nucleotide- or amino acid-based analyses

We analysed pneumococcal choline-binding proteins (CBPs) localised on the bacterial cell surface through interaction with choline-binding repeats and phosphoryl choline on the cell wall. At least some CBPs play key roles in cell wall physiology, in pneumococcal adhesion and invasion, and in evasion of host immunity. S. pneumoniae harbours various CBPs including N-acetylmuramoyl l-alanine amidase (LytA), which induces pneumococcal-specific autolysis16,17,18. Pneumococcal surface protein A (PspA) is a highly variable protein and inhibits complement activation17,18,19,20. Choline-binding protein A (CbpA; also called PspC) works as a major pneumococcal adhesin and contributes to evasion of host immunity via interaction with several host proteins17,18,21. Choline-binding protein L (CbpL) contains the choline binding repeats sandwiched between the Excalibur and lipoproteins domains and works as an anti-phagocytic factor22. Although several CBPs have been characterised, their phylogenetic relationships remain unclear and the unclassified gene names are confusing. We first analysed the distribution of genes encoding CBPs based on pneumococcal genome sequences. Orthologues of genes in each strain were identified by phylogenetic analysis. We then calculated the evolutionary selective pressure on each codon from the phylogenetic trees and aligned sequences. We found that cbpJ contains the highest rate of codons under negative selection. CbpJ has no known functional domains except signal sequences and choline-binding repeats, and its role in pneumococcal pathogenesis is unclear. Functional analyses revealed that CbpJ contributes to evasion of host neutrophil-mediated killing in pneumococcal pneumonia. Thus, evolutionary analysis focusing on negative selection can reveal novel virulence factors.


Distribution of cbp genes among pneumococcal strains

Genes encoding CBPs among pneumococcal strains were extracted by tBLASTn search (Supplementary Data 1). Some genes were re-annotated since the search results showed that certain homologous regions were not matched to annotated open reading frames (ORFs). In strain SPNA45, SPNA_01670 contains both predicted promoter regions and intact ORF structures of cbpF and cbpJ. On the other hand, cbpG-homologous regions in strains R6, D39, SPN034183, SPN994038 and SPN994039 did not contain promoters (Supplementary Data 1 and Supplementary Table 1). Orthologous relationships of each gene were analysed. The distribution of cbp genes was not correspondent with capsular serotypes (Fig. 2). Four genes—i.e., lytA, lytB, cbpD and cbpE—were conserved as intact ORFs in all 28 pneumococcal strains (Fig. 2). Other cbp genes contained frameshift mutations in the orthologues or were absent in some strains.

Fig. 2
figure 2

Distribution of genes encoding CBPs among pneumococcal strains. The gene locus tag numbers are shown in Supplementary Data 1. Blue, yellow, and grey show the presence, pseudogenisation, and absence of genes, respectively. *These genes are annotated as one gene, but our bioinformatic analysis indicates that they are independent genes

Phylogenetic relationships in pneumococcal CBPs

Phylogenetic relationships of genes encoding CBPs in pneumococcal species are confusing since some genes in the same cluster show high similarity to each other. To clarify the relationships, we compared common nucleotide sequences among genes encoding CBPs in the strain TIGR4. Maximum likelihood and Bayesian phylogenetic analyses revealed two common clusters: one comprising cbpF, cbpG, cbpJ, cbpK and cbpC, and the other comprising lytA, lytB, lytC, cbpL and cbpE (Fig. 3 and Supplementary Fig. 1). The names of some cbp genes were not consistent with those of phylogenetically related genes. In particular, cbpF, cbpG, cbpJ and cbpK were located close to each other in pneumococcal genomes and showed high similarity. We thus defined orthologous genes in each pneumococcal strain based on maximum likelihood and Bayesian phylogenetic analyses (Fig. 4 and Supplementary Fig. 2). The gene locus tag numbers in orthologous relationships are shown in Supplementary Data 1. The sequence similarity of cbpF, cbpG, cbpJ and cbpK and their close proximity within genomes indicated that a common ancestral S. pneumoniae acquired the genes by duplication. Phylogenetic trees showed well-separated clusters of each gene. These independent relationships indicated that horizontal gene transfer did not contribute to the spread of cbpF, cbpG, cbpJ and cbpK in S. pneumoniae species, despite their ability to take up exogenous DNA. The genetic diversity of these genes may have been established by accumulation of natural mutations during pneumococcal transmission.

Fig. 3
figure 3

Phylogenetic relationship of cbp genes in TIGR4. Nucleotide-based Bayesian phylogenetic tree of cbp genes of S. pneumoniae strain TIGR4. The tree is unrooted and posterior probabilities are shown near the nodes. The scale bar indicates nucleotide substitutions per site

Fig. 4
figure 4

Phylogenetic analyses of cbp genes with high similarity. a, b Nucleotide-based Bayesian phylogenetic tree of the lytA, lytB, lytC, cbpE and cbpL genes (a) and the cbpF, cbpG, cbpJ and cbpK genes (b) in S. pneumoniae. The trees are unrooted although they are presented as midpoint-rooted for clarity. Strains with identical sequences are listed on the same branch. Posterior probabilities are shown near the nodes. The scale bar indicates nucleotide substitutions per site

Evolutionary selective pressures on each of the CBP codons

To evaluate the significance of CBPs in real-life infection and transmission, we performed molecular evolutionary calculations based on bacterial genome diversity established after transmission of infection in an uncontrolled population. The nucleotide sequences of each CBP were aligned by codon, and conserved common codons were used for phylogenetic analysis (Supplementary Fig. 3). The selective pressure on each gene was calculated based on the phylogenetic trees and aligned sequences (Table 1). The rates of codons under negative selection are visualised in Supplementary Fig. 4. Over 13% of total codons in cbpJ and lytA were under negative selection compared to <5% for other cbp genes, indicating that these genes play an important role in the success of S. pneumoniae species. On the other hand, pspA encoding the genetically divergent virulence factor PspA, contained fewer evolutionarily conserved codons, but had the highest numbers of codons under positive pressure. Additionally, there were no evolutionarily conserved codons in cbpG, cbpC and cbpL. The latter two had no common codons as few genes had frameshift mutations. When we re-calculated selective pressure without these genes, we found a low rate of codons under negative selecion among CBP-encoding genes (Supplementary Table 2).

Table 1 Evolutionary analyses of genes encoding choline-binding proteinsa

CbpJ acts as a virulence factor in pneumococcal pneumonia

While CbpJ had the highest rate of codons under negative selection among pneumococcal CBPs, it has no known functional domains except a choline-binding repeat in its amino acid sequence. Moreover, its role in pneumococcal pathogenesis is unknown. In contrast, CbpL had no common comparable codons and showed limited numbers of evolutionarily conserved codons even after the above-described adjustment. The domain structures and codons of CbpJ and CbpL under negative selection are shown in Fig. 5a. The domains were searched using MOTIF Libraries including PROSITE, NCBI-CDD, and P-fam23,24,25,26. To assess the roles of CbpJ and CbpL in pneumococcal pathogenesis, we generated mutant strains deficient in the corresponding genes. The mutant strains showed a slightly steeper growth curve in THY medium, and there were no significant differences in maximum growth rate (Supplementary Fig. 5a, b). There were no differences among the strains in minimum inhibitory concentration or minimum bactericidal concentration values for penicillin G or in bacterial morphology (Supplementary Table 3 and Supplementary Fig. 5c). Gram-stained WT and mutant strains in stationary phase showed that most cells were stained violet, whereas almost all cells of strains in the decline phase were stained pink probably due to autolysis (Supplementary Fig. 5c). The lytA gene expression was slightly increased in the ΔcbpJ strain compared to that in the WT strain at the log and decline phases (Supplementary Fig. 5d). However, as described above, the difference did not seem to affect pneumococcal autolysis substantially. We first performed a mouse intranasal infection assay to investigate the role of CbpJ and CbpL in pneumonia. Mice intranasally infected with strain ΔcbpJ showed an improved survival rate compared to those infected with WT S. pneumoniae; although a similar tendency was observed for ΔcbpL-infected relative to WT mice; the difference was not statistically significant (Fig. 5b). The number of bacteria in the bronchoalveolar lavage fluid (BALF) from ΔcbpJ-infected mice was lower than that in the BALF from ΔcbpL- and WT-infected mice (Fig. 5c). We also performed competitive assay by intranasal co-infection with the WT and ΔcbpJ strains. The BALF at 24 h after infection showed fewer bacterial CFUs of ΔcbpJ compared to those of the WT (Fig. 5d). We also examined whether CbpL or CbpJ contributes to the association of S. pneumoniae with alveolar epithelial cells and found that WT S. pneumoniae as well as ΔcbpL and ΔcbpJ mutant strains did not differ in their ability to adhere to A549 human alveolar epithelial cells (Fig. 5e).

Fig. 5
figure 5

Deficiency of cbpJ decreased pneumococcal virulence in mouse pneumonia model. a Amino acid sequences and domain structures of CbpL and CbpJ in strain TIGR4. Bold, black underlined, and magenta underlined characters represent comparable codons and those under purifying or positive selection, respectively. b Mouse pneumonia model. Mice were intranasally infected with 5 × 107 CFU of S. pneumoniae TIGR4 WT, ΔcbpL, or ΔcbpJ strains, and survival was monitored for 14 days. c Pneumococcal CFU in BALF collected at 24 h after intranasal infection. The difference between groups was analysed using the Kruskal–Wallis test with Dunn’s multiple comparisons test. d S. pneumoniae TIGR4 WT and ΔcbpJ strains were examined for their competitive infection activities. BALF was collected at 24 h after intranasal infection. The difference between groups was analysed with the Wilcoxon matched-paired signed rank test. Competitive ratio was calculated by dividing the ΔcbpJ-CFU of a mouse by the WT-CFU of the same mouse. e S. pneumoniae TIGR4 WT, ΔcbpL and ΔcbpJ strains were examined for their ability to associate with A549 cells. Differences between groups were analysed using ordinary one-way ANOVA with Tukey’s multiple comparisons test. Data are presented as the mean of 6-8 samples with standard error (ce)

However, the S. pneumoniae WT strain exhibited extensive inflammatory cell infiltration and bleeding compared to that with the ΔcbpJ strain. Histological examination of lung tissue from intranasally infected mice showed that ΔcbpJ induced milder inflammation compared to the WT strain. Lung tissue from ΔcbpL-infected mice showed moderate inflammation (Fig. 6a). We also measured the bacterial survival rate after incubation with human neutrophils in the absence of serum. Strains ΔcbpJ and ΔcbpL had a lower survival rate than that of the WT, whereas ΔcbpJ showed a slightly increased growth rate compared to that of the WT and ΔcbpL strains in RPMI 1640 medium without neutrophils (Fig. 6b and Supplementary Fig. 5e). We also generated recombinant CbpJ using a codon-optimised cbpJ sequence for expression in E. coli and measured the bacterial survival rate after incubation with neutrophils and the recombinant protein. In the presence of recombinant CbpJ, the survival rate of the ΔcbpJ strain was recovered (Supplementary Fig. 6). These results suggest that CbpJ contributes to the evasion of neutrophil-mediated killing. Next, we performed a mouse intravenous infection assay to investigate the role of CbpJ and CbpL in sepsis. In the infection model, the survival rates of ΔcbpL- and ΔcbpJ-infected mice did not differ significantly from those of mice infected with WT S. pneumoniae (Fig. 6c). We also performed a blood bactericidal assay. The survival rates of ΔcbpJ and ΔcbpL strains in mouse blood were comparable to those of the WT strain (Fig. 6d). We also found that incubation of S. pneumoniae in human plasma for 3 h inhibited the expression of cbpL and cbpJ, as determined by quantitative real-time PCR (Fig. 6e). These results indicate that CbpJ acts as a pneumococcal virulence factor in lung infection by contributing to the evasion of neutrophil-mediated killing, whereas CbpJ has no role in bacterial survival in blood. In addition, cbpL deficiency in strain TIGR4 did not significantly attenuate pathogenesis in the mouse lung and blood infection.

Fig. 6
figure 6

cbpJ and cbpL are downregulated in the presence of plasma, and do not affect pneumococcal survival in mouse blood. a Haematoxylin and eosin staining of infected mouse lung tissue collected 24 h after intranasal infection with 5 × 107 CFU of S. pneumoniae TIGR4 WT, ΔcbpL, or ΔcbpJ strains. Scale bars, 200 µm (upper panels), 50 µm (middle panels) and 20 µm (lower panels). b Growth of pneumococcal strains in the presence of human neutrophils. Bacterial cells were incubated with neutrophils for 1, 2 and 3 h at 37 °C and 5% CO2, then serially diluted and plated on THY blood agar. The number of CFUs was determined following incubation. Growth index was calculated by dividing the CFU after incubation by the CFU of the original inoculum. Data are presented as the mean of six samples with standard error. c Mouse sepsis model. Mice were intravenously infected with 2 × 106 CFU of S. pneumoniae TIGR4 WT, ΔcbpL or ΔcbpJ, and survival was monitored for 14 days. Differences between infected mouse groups were analysed with the log-rank test. d Growth of pneumococcal strains in mouse blood. Bacterial cells were incubated in blood for 1, 2 and 3 h at 37 °C and 5% CO2. Data are presented as the mean of six samples with standard error. e Fold changes in transcript levels of cbpL and cbpJ in TIGR4 WT S. pneumoniae cells in the presence or absence of human plasma. The 16 S rRNA gene was used as an internal standard. Data were pooled and normalised from three independent experiments, each performed in triplicate. Differences between groups were analysed with the ordinary one-way ANOVA with Tukey’s multiple comparisons test (b, d, e)


In this study, we investigated the evolutionarily conserved rates of CBP codons since these cell surface proteins directly interact with the external environment, which induces rapid rates of evolution in genes involved in genetic conflicts14. Evolutionary analysis based on phylogenetic relationships can reveal regions in which the encoded amino acids are not allowed to change even under selective pressure. The genetic diversity of S. pneumoniae isolated from patients was the result of transmission in a real population. Thus, the evolutionary conservation rate is a parameter that reflects the importance of the protein in human infection. Although so-called arms races involve both the host and bacteria, most studies on genetic diversity have focused on the former14,27,28,29. For example, evolutionary studies based on inter-species comparisons have shown that most of the positive selection targets in host receptors are located in regions that are responsible for direct interactions with pathogens. Our study focused on negative selection targets in bacterial surface proteins through an evolutionary analysis based on intra-species comparisons. This approach enabled us to estimate the contribution of bacterial proteins to species success throughout the life cycle, including inside the host and during the transmission phase.

We previously detected bacterial virulence factors by function prediction – e.g., by searching for conserved motifs/domains, constructing random transposon libraries, or analysing the biochemical properties of the pathogen30,31,32,33,34. Although these laboratory-based approaches are valuable, they are time-consuming and costly, and may not yield the expected results. It is useful to examine the correlation between a target molecule and clinical features as this can minimise the time and cost required for analysis. Furthermore, in basic studies on bacterial pathogens, animal infection models are often used to determine whether a bacterial molecule acts as a virulence factor. Although this is the best means of obtaining in vivo information, it is unclear how accurately it reflects the clinical condition in humans. Combining an evolutionary analysis and an animal model would thus be highly effective for evaluating the functional significance of a putative virulence factor.

Genome-wide association study (GWAS) is a powerful tool for identifying the relationship between genetic variants—mainly single nucleotide polymorphisms (SNPs) – and phenotype, such as in diseases. As GWAS has become more prevalent, various programmes and software packages have been developed for this purpose35,36. On the other hand, this approach has certain limitations including the requirement for an appropriate control group and detailed information regarding phenotype. In infectious diseases, it can be difficult to quantify clinical features recorded at different medical centres. Furthermore, in the case of most pathogens, there are no natural attenuated or avirulent strains that can serve as a control group. Our evolutionary analysis has the advantage that it can be performed with genomic information of pathogenic strains only by assuming the presence of pathogens as a phenotype evading natural selection. Since synonymous and non-synonymous substitutions are estimated to occur with equal probability under no selective pressure, a population in which the latter has resulted in extinction by natural selection can serve as a control group. While we have shown in the current study that evolutionary analysis with a small population has the power to detect evolutionarily conserved proteins, a larger population would allow a higher-resolution analysis, including detection of conserved regions in some pathogenic strains isolated from a specific site of infection or pathological condition. Since this analysis involves simultaneous processing of aligned nucleotide and amino acid sequences, more information is obtained from only SNPs extracted from nucleotide sequences. In addition, automated phylogenetic and evolutionary analyses are needed to analyse a large population. Therefore, the development of software packages for meta-data is expected to aid the widespread application of this analytical approach.

There are some limitations to our evolutionary analysis. First, although it can detect evolutionarily conserved proteins, it cannot identify diverse virulence factors such as PspA and CbpA within species19,37,38. Similarly, virulence factors recently acquired by horizontal gene transfer have not been under selective pressure for a sufficiently long period to perform this analysis. In addition, the high rate of codons under negative selection indicate their universal importance in bacterial species. In other words, a molecule under relaxed selective pressure could contribute to the virulence of some populations of the species. However, these features of molecular evolutionary analysis can be advantages when screening for therapeutic target sites or vaccine antigens with a low frequency of missense mutations, which could reduce the virulence or survivability of the pathogen. Evolutionary analysis could also be an effective alternative strategy for overcoming drug resistance through antigen replacement, and could reduce costs associated with drug discovery and development.

The lytA gene, which was conserved among virtually all pneumococcal strains, showed the highest rates of codons under negative selection, except for cbpJ, which was only present in some strains. LytA is known to induce pneumococcal-specific autolysis39 and contributes to pneumococcal virulence16,40. Our evolutionary analysis supports previous reports that lytA is a suitable genetic marker41,42 due to its evolutionary conservation. We also showed that pspA and cbpA show relatively high rates of codons under positive selection, and both encode polymorphic virulent proteins17,19,37 that are candidate vaccine antigens, even though these genes are not universally present within a global serotype 1 collection38. In addition, selective pressure by vaccines can easily cause differentiation or deficiency of these proteins as the corresponding genes contain few codons under negative selection. A multivalent system would be required for vaccines prepared using these antigens.

An in vivo competition assay in mice indicated that deficiency of cbpJ is a disadvantage for pneumococcal survival in vivo. On the other hand, co-infection showed a smaller difference in bacterial CFUs between WT and ΔcbpJ as compared to each single infection. In the single infection of the ΔcbpJ strain, the bacteria could not be protected by CbpJ. However, in co-infection, the interaction of neutrophils and CbpJ in the WT strain could suppress neutrophil killing activity. In addition, some CbpJ may be released from the WT strain by autolysis. As a result, some of the ΔcbpJ strain could have been protected similar to the WT strain. Concerning selection, it was previously reported that a single-cell bottleneck effect in pneumococcal infection occurs during bloodstream invasion and in transmission between hosts43,44. Our finding also suggests that a bottleneck effect occurs in a limited situation. The difference in bacterial burden of BALF between single and competitive infections suggested a possibility that the bottleneck effect plays a more important role for the selection of cbpJ-lacking cells compared to the competition in the lung.

In this study, cbpL and cbpJ were downregulated in the presence of plasma. Although regulation of CBPs is still largely unknown, one possible hypothesis is that the genes are regulated by a pneumococcal two component system (TCS). S. pneumoniae interplays with its environment by using 13 TCSs and one orphan response regulator45,46. TCSs typically consist of a membrane-associated sensory protein called a histidine kinase and a cognate cytosolic DNA-binding response regulator, which acts as a transcriptional regulator. Although specific stimuli to histidine kinases still remain unclear, there is a possibility that a histidine kinase sensor protein of the TCSs can respond to some plasma components.

Although the difference was not statistically significant, mice intranasally infected with TIGR4 ΔcbpL strain showed a trend towards improved survival relative to the WT-infected mice. In a previous study, a D39lux cbpL-deficient strain showed reduced virulence compared to the WT strain22. Since CbpL sequences in TIGR4 and D39 strains are similar, the discrepancy between the previous study and our findings is likely due to differences in other surface proteins in each strain. For example, the absence of CbpJ, which contributes to the evasion of neutrophil killing, could affect the survivability of D39.

Frolet et al. reported that both CbpJ and CbpL are considered as possible adhesins because they display interaction with C-reactive protein (CRP), and with CRP, elastin, and collagen in solid phase assay, respectively47. Gosink et al. noted no significant differences between WT and cbpJ mutant strains in Detroit nasopharyngeal cells adhesion, rat nasopharynx colonisation, or pathogenesis in a sepsis model between the WT and the cbpJ mutant strains48. Their results are mostly consistent with our data. We also showed that there were no significant differences in the A549 cells adhesion assay and in intravenous infection as a sepsis model. On the other hand, we found a difference in the lethal intranasal mouse infection that is completely different from the non-lethal colonisation model. We consider that CbpJ contributes to pneumococcal evasion of host immunity rather than colonisation. Concerning CbpL, elastin and collagen are extracellular matrix proteins and binding activity to these proteins could contribute to bacterial adhesion, whereas CRP is found in blood plasma and is used as a marker of inflammation. However, CbpL did not contribute at least to pneumococcal adhesion to A549 cells. There is a discrepancy between protein–protein interactions in the solid phase and host cell-bacteria interactions.

Recently, anti-virulence drugs have been developed as an additional strategy to treat or prevent bacterial infections. Drugs targeting bacterial virulence factors are expected to reduce the selective pressure of conventional antibiotics since they would not affect the natural survival of targeted bacteria49. Furthermore, the abundance of candidate targets is a major advantage of antivirulence strategies. Effective design of vaccines and antivirulence drugs requires a thorough understanding of virulence factors; combining our evolutionary analysis and traditional molecular microbiological approaches can improve the detection of potential drug targets. In this study, we identified CbpJ as a novel evolutionarily conserved virulence factor. Thus, molecular evolutionary analysis is a powerful system that can reveal the importance of virulence factors in real-world infections and transmission.


Phylogenetic and evolutionary analyses

Phylogenetic and evolutionary analyses were performed as described previously50,51, with minor modifications. Homologues and orthologues of cbp genes were searched using the tBLASTn function of NCBI BLAST. Domain structures of CbpJ and CbpL were searched by MOTIF Search23 with PROSITE, NCBI-CDD, and P-fam24,25,26. Bacterial ORFs and promoters were predicted using FGENESB (Bacterial Operon and Gene Prediction) and BPROM, respectively52. To prevent node density artefacts, sequences with 100% identity were treated as the same sequence in Phylogears253,54. The sequences were aligned using MAFFT v.7.221 with an L-INS-i strategy55, and ambiguously aligned regions were removed using Jalview56,57. Calculated orthologous regions were used for further phylogenetic analysis, and edited codon sequences were re-aligned using MAFFT with an L-INS-i strategy. The best-fitting codon evolutionary models for MrBayes and RAxML analyses were determined using Kakusan458. Bayesian Markov chain Monte Carlo analyses were performed with MrBayes v.3.2.559, and 2 × 106 generations were sampled after confirming that the standard deviation of split frequencies was <0.01 for up to 8 × 106 generations. To validate phylogenetic inferences, maximum likelihood phylogenetic trees with bootstrap values were generated with RAxML v.8.1.2060. Phylogenetic trees were generated using FigTree v.1.4.261 based on the calculated data.

Evolutionary analyses were performed based on aligned orthologous regions of cbp genes and Bayesian phylogenetic trees. Whole-gene non-synonymous/synonymous ratio calculations as well as statistical tests for negative or positive selection of individual codons were performed using the two-rate fixed-effects likelihood function in the HyPhy software package62.

Bacterial strains and construction of mutant strains

Streptococcus pneumoniae strains were cultured in Todd-Hewitt broth (BD Biosciences, Franklin Lakes, NJ, USA) supplemented with 0.2% yeast extract (BD Biosciences) (THY medium) at 37 °C. For mutant selection and maintenance, spectinomycin (Wako Pure Chemical Industries, Osaka, Japan) was added to the medium at a concentration of 120 µg/ml.

S. pneumoniae TIGR4 isogenic cbpJ (ΔcbpJ) and cbpL (ΔcbpL) mutant strains were generated as previously described33. Briefly, the upstream region of cbpJ or cbpL, an aad9 cassette, and the downstream region of cbpJ or cbpL were combined by PCR using the primers shown in Supplementary Table 4. The products were used to construct the mutant strains by double-crossover recombination with the synthesised CSP263. All mutations were confirmed by PCR amplification of genomic DNA isolated from the mutant strains. For growth measurements, pneumococci were cultured until the optical density at 600 nm (OD600) reached 0.4, and the exponential phase cultures of each strain were back-diluted into fresh THY and grown at 37 °C. Growth was monitored by measuring the values of OD600 every 0.5–1 h. Maximum growth rates were calculated by determining the values of OD600 and by using the DMFit programme64. For the following assays, S. pneumoniae strains were grown to exponential growth phase (OD600 = ~0.4) unless otherwise indicated, and then resuspended in PBS or the appropriate buffer.

Preparation of recombinant CbpJ

The cbpJ sequence without codons encoding the signal peptide sequence was optimised for E. coli using GENEius software, and the optimised sequence was synthesised (Eurofins Genomics, Brussel, Belgium). Optimised cbpJ and pQE-30 vector (Qiagen, Valencia, CA, USA) were amplified with the specific primers listed in Supplementary Table 4 and PrimeSTAR® MAX DNA Polymerase (TaKaRa Bio, Shiga, Japan). The DNA fragments were assembled using the GeneArt® Seamless Cloning and Assembly Kit (Thermo Fisher Scientific, Waltham, MA, USA). The constructed plasmid was transformed into E. coli XL-10 Gold (Agilent, Santa Clara, CA, USA), and recombinant CbpJ was purified as described previously31,33,65,66,67.

Blood and neutrophil bactericidal assays

A blood bactericidal assay was performed as previously described31,33,68. Mouse blood was obtained via cardiac puncture from healthy female CD-1 mice (Slc:ICR, 6 weeks old; Japan SLC, Hamamatsu, Japan). For human neutrophil isolation, blood was collected via venepuncture from healthy donors after obtaining written, informed consent according to a protocol approved by the institutional review board of Osaka University Graduate School of Dentistry (H26-E43). Neutrophils were isolated from fresh human blood by density gradient centrifugation using Polymorphprep (Alere Technologies, Jena, Germany). Pneumococcal cells grown to the mid-log phase were washed and resuspended in phosphate-buffered saline (PBS). Bacterial cells (1 × 104 CFU/20 µl) were combined with fresh mouse blood (180 µl) or human neutrophils (2 × 105 cells/180 µl) in RPMI 1640 medium, and the mixture was incubated at 37 °C with 5% CO2 for 1, 2, and 3 h. Viable cell counts were determined by seeding diluted samples onto THY blood agar. The percent of the original inoculum was calculated as the number of CFU at the specified time point divided by the number of CFU in the initial inoculum.

Minimum inhibitory concentration and minimum bactericidal concentration assays

Minimum inhibitory concentration and minimum bactericidal concentration assays were performed as previously described51,69. For minimum inhibitory concentration and minimum bactericidal concentration assays, 0.5–1.0 × 104 bacteria were added into THY broth supplemented with twofold serial dilutions of penicillin G. Bacterial growth after 24 h at 37 °C in anaerobic conditions was spectrophotometrically measured at OD620. We defined the OD620 values <0.06 as complete inhibition of bacterial growth. To determine minimum bactericidal concentrations, we inoculated 5 µL of the bacterial cultures onto THY blood agar and incubated them at 37 °C under anaerobic conditions. The antimicrobial concentration at which no growth was detectable was defined as the minimum bactericidal concentration.

Mouse infection assays

All mouse experiments were conducted in accordance with animal protocols approved by the Animal Care and Use Committee of Osaka University Graduate School of Dentistry (28–002–0). Female CD-1 mice (Slc:ICR, 6 weeks old) were intranasally infected with 5 × 107 or 2 × 106 CFU of S. pneumoniae via the tail vein. Mouse survival was monitored for 14 days. At 24 h after intranasal infection, animals were euthanized by lethal intraperitoneal injection of sodium pentobarbital and lung tissue or BALF samples were collected. Bacterial counts in BALF were determined by plating serial dilutions. Lung tissue specimens were fixed with 4% formaldehyde, embedded in paraffin, and cut into sections that were stained with haematoxylin and eosin solution (Applied Medical Research, Osaka, Japan) and visualised with a BZ-X710 microscope (Keyence, Osaka, Japan). For the competition assay, CD-1 mice were intranasally infected with 20 µL of the mixture of wild-type (1.0 × 107 CFU) and ΔcbpJ (1.5 × 107 CFU) strains resuspended in PBS, in total, ~2.5 × 107 CFU. BALF samples were collected at 24 h after infection and bacterial counts in BALF were determined. Total and mutant strain CFUs were determined by serial dilution plating on THY blood agar with or without spectinomycin. The CFU number for the wild-type strain was calculated by subtracting that of the mutant strain from the total CFUs.

Quantitative real-time PCR

Quantitative real-time PCR was performed as previously described50,51, with minor modifications. Primers are listed in Supplementary Table 4. Total RNA of pneumococcal strains grown to the mid-log phase (OD600 = 0.4–0.5) was isolated with an RNeasy Mini kit (Qiagen) and RQ1 RNase-Free DNase (Promega, Madison, WI, USA), and cDNA was synthesised with SuperScript IV VILO Master Mix (Life Technologies, Carlsbad, CA, USA). Quantitative real-time PCR analysis was performed on a StepOnePlus Real-Time PCR system using Power SYBR Green Master PCR mix (Thermo Fisher Scientific). 16 S rRNA was used as a normalising control.

Statistical analysis

Statistical analysis of in vitro and in vivo data was performed with Mann–Whitney test, Kruskal–Wallis test with Dunn’s multiple comparisons test, Wilcoxon matched-paired signed rank test, and ordinary one-way ANOVA with Tukey’s multiple comparisons test. Mouse survival curves were compared with the log-rank test. Differences were considered statistically significant at P < 0.05. The tests were performed on Prism v.6.0 h or v.7.0d software (GraphPad Inc., La Jolla, CA, USA). All experiments were repeated at least three times. In the evolutionary analyses, P < 0.1 was regarded as a significant difference with the HyPhy default setting.

Reporting Summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All data generated or analysed during this study are included in this published article and its supplementary data files.