Streptococcus pneumoniae is a major cause of meningitis, bacteremia, and pneumonia worldwide and is responsible for over 300,000 deaths annually in children under the age of 51. Although pneumococcal conjugate vaccines (PCVs) have substantially reduced the burden of invasive pneumococcal disease (IPD) in children and adults, breakthrough disease and serotype replacement (i.e., an increase in the frequency of non-vaccine serotypes) are ongoing concerns1,2,3.

The first PCV approved for use in children targeted seven serotypes (PCV7), and subsequent PCVs expanded the valency to 10 (PCV10) and 13 (PCV13) serotypes4. Higher valency PCVs are being developed to target an even larger number of serotypes5. Ideally, the efficacy of these new PCVs would be evaluated in randomized controlled trials (RCTs). However, it is not feasible to conduct RCTs for several reasons: if populations are already using PCVs, the number of events would be too small for meaningful evaluations and comparisons between products. Additionally, placebo-controlled trials are not considered ethical when an effective vaccine is available6. Therefore, new PCVs, starting with PCV10, have been evaluated via their immunogenicity and approved by the FDA if they elicit an immune response that is non-inferior to existing PCVs. For pediatric vaccines, the primary outcome used for evaluating new PCVs is the concentration of immunoglobulin G (IgG) that targets the pneumococcal capsule. Because licensure decisions are based on IgG concentrations, this is the primary basis of comparison between new and existing PCVs until vaccine effectiveness (VE) studies are performed (typically several years after the introduction of the new PCVs). Inevitably, technical advisory groups and decision makers will try to interpret differences in immunogenicity between vaccines and make inferences about effectiveness. For instance, comparing PCV13 and PCV7, there was a weaker IgG response for all of the PCV7 serotypes following the primary doses of PCV13, and weaker response after the booster dose for all of the PCV7 serotypes except 19F7.

While IgG concentrations are the standard basis of comparison, decision-makers are interested in the implications of these differences for VE. Differences in immunogenicity between PCVs can be misinterpreted. There is therefore a need for a framework that contextualizes differences in PCVs and attempts to express any differences in terms of VE.

The standard practice in the pneumococcal community has been to interpret an IgG concentration of 0.35 µg/mL as a correlate of protection against IPD, with that same correlate of protection used for all serotypes8. This is based on a method by Siber et al. that determines the protective concentration of IgG for the PCVs, based on the vaccine efficacy measured in clinical trials9. The correlate of protection actually varies by serotype and population9,10, and this variation needs to be taken into account when projecting VE based on differences in immunogenicity.

In the present analysis, we leverage and advance Siber’s method to calculate protective concentrations (Cp) using real-world effectiveness data and then apply Siber’s method “in reverse” to show that a known Cp can be used to predict the effectiveness of higher valency vaccines, and that this can be done in a serotype-specific manner using summary-level data. The objective of this work was to assess the utility of this method for predicting the serotype-specific effectiveness of next-generation PCVs. The assessment is achieved by predicting PCV13’s effectiveness against the PCV7 serotypes and then comparing the predicted effectiveness values with corresponding published values.


Serotype-specific protective antibody concentration

Serotype-specific Cp for the PCV7 serotypes were calculated using the source data from each country (Table 1) as input to the calculations described under “Methods” below. Median values ranged from 0.08 (serotype 6B) to 1.27 (serotype 19 F) µg/mL in the United Kingdom, from 0.64 (serotype 23 F) to 6.08 (serotype 19 F) µg/mL in Australia, and from 0.08 (serotype 6B) to 2.96 (serotype 4) µg/mL in Germany.

Table 1 Predicted serotype-specific protective antibody concentration for the PCV7 serotypes in PCV13.

Serotype-specific vaccine effectiveness

Serotype-specific VE values were predicted for each of the seven serotypes shared between PCV7 and PCV13 using the immunogenicity data from each country (United Kingdom, Australia, and Germany) along with the Cp values calculated in the first step (Table 2).

Table 2 Predicted serotype-specific vaccine effectiveness for the PCV7 serotypes in PCV13.

Aggregate vaccine effectiveness

No serotype-specific values of VE for PCV13 were available for comparison, so previously reported aggregate values (for the types in PCV7) were used as a standard. The predicted aggregate VE values are shown in Table 3. The median values for each prediction were close to the previously observed values for the United Kingdom (93% predicted versus 90% reported10), Australia (71% versus 70%11), and Germany (91% versus 90%12).

Table 3 Predicted versus observed aggregate vaccine effectiveness for the PCV7 serotypes in PCV13.

Lower bounds of the confidence intervals deviated more widely from the observations in both the United Kingdom and Germany. The deviation of the lower bound is due to the method for VE prediction capturing more variability in VE overall compared to what was observed. The increased variability in effectiveness is reflected at both bounds of the prediction, but because the upper bound of the prediction cannot exceed 100%, the deviation is not as profound at the upper bound. The deviation seen on the lower bound of the aggregate prediction is due to the high variability in the serotype-specific VE input data, in some cases varying from negative values to 100% (Table 4). In cases where effectiveness input values were highly variable, the protective concentration estimation would be highly variable which then leads to a highly variable effectiveness prediction. For Australia, the predicted lower bound of the confidence interval was higher than the reported value of −8%. The reported −8% may not be realistic since this suggests that PCV13 increases IPD. The results are most likely due to the variability resulting from a relatively small number of cases on one or more of the included serotypes. Furthermore, to predict a negative VE with our method, the placebo reverse cumulative distribution curve (RCDC) must have a higher percentage of subjects that meet or exceed an IgG concentration compared to the PCV13 treated arm, which was not observed for any serotype in PCV13. Based on the observed placebo and PCV13 geometric mean concentrations (GMCs) and distributions, the 3% lower bound estimate may be more realistic.

Table 4 Summary-level input data.

To evaluate the performance of predicted VE and the importance of Cp values, we ran simulations using the method described using the commonly accepted 0.35 µg/mL protective threshold (for every serotype). The results in Table 1 show Cp values ranging from 0.08 to 6.08. The changes in the resulting VE (Table 5; relative to values estimated using the fixed value of Cp = 0.35) show that proper, serotype-specific Cp estimation is needed to estimate serotype-specific VE: in cases where the protective thresholds are substantially different than 0.35, the resulting predicted VE values are also substantially different (Table 5). Estimation of serotype-specific Cp also results in better alignment between predicted and reported aggregate (incidence rate-weighted) VE (Table 5).

Table 5 Predicted serotype-specific vaccine effectiveness for the PCV7 serotypes in PCV13: Estimations using serotype-specific protective threshold compared to the estimations using pan-serotype 0.35 µg/mL correlate of protection.


With the widespread use of effective PCVs in the pediatric population, it is no longer feasible or ethical to perform placebo-controlled clinical efficacy studies for a new vaccine against IPD. Thus, current and future trials will continue to measure only the immune titers induced by a new vaccine. Therefore, evaluation of the potential impact of new PCVs on public health requires a method by which real-world effectiveness data can be reliably predicted from the immunogenicity data. As next-generation PCVs are developed in the coming years, this capability will be increasingly important in order to contextualize differences in immunogenicity between vaccines in terms of expected impact on public health. These modeled estimates provide a bridge between immunogenicity data and VE but should be used with caution and need to be verified with post-licensure evaluations of VE.

The model presented here enables serotype-specific estimates for Cp and for VE values, thus allowing predictions for and comparisons between current and future vaccines. The current standard practice is to use the aggregate Cp value of 0.35 μg/mL, derived for PCV7 serotypes, to predict and compare VE for not only the original seven serotypes, but also additional serotypes whose efficacy has not been shown in trials. Andrews et al. shows that the aggregate value is an imprecise predictor of the probable effectiveness of individual serotypes10. Performance of predicted VE was also evaluated: estimation of serotype-specific Cp results in better alignment between estimated and reported aggregate (incidence rate-weighted) VE (Table 5) compared to using the 0.35 μg/mL aggregate, as also suggested by Andrews et al.10.

Therefore, the serotype-specific estimates for Cp and VE obtained using the method described here provides a more accurate prediction of the probable protection afforded by PCV13 for the serotypes in common with PCV7. This modeling method has the potential to better estimate the effectiveness of next-generation PCVs against the serotypes shared with the current PCV, and, thus, to better inform public health decisions.

Several limitations should be kept in mind about the derivation of Cp and effectiveness as described here. The Cp and effectiveness prediction applies only to the prevention of IPD in children who resemble the trial populations. The lack of generalizability to other populations is because effectiveness is not only dependent on the strength of the immune response the vaccine elicits, but also on other factors13 including age at vaccination and the time interval between vaccination and serum sampling, which were found to explain 17–20% of the variance in antibody response to the serotypes in PCV7 and PCV1314. Geographic differences in the immune response to each serotype are also evident, with higher responses in children from South Africa than children living in the United States9. Such differences could have both genetic and environmental components, and is likely to depend also on dosing schedule and PCV valency15,16. Previous exposure to the serotypes could also be substantially different between countries and even within sub-populations, which may impact vaccine response and partial protection in the unvaccinated populations, resulting in a relative shift in effectiveness which could also change dynamically with fluctuations in relative incidence rates of circulating serotypes.

In addition to these considerations, several underlying assumptions also place limitations on the real-world applicability of our method. The effectiveness prediction does not use a functional assay output, like the pneumococcal opsonophagocytic killing assay, but instead relies on IgG concentration. However, previous work (Siber et al.9) demonstrates the utility of IgG in pediatric populations, thus mitigating any implied risk. Furthermore, opsonophagocytic killing assay titer values were not used in this work due to the relatively large assay variability (both between laboratories and over time) without a standard comparator assay, like the WHO ELISA for IgG concentrations, which is primarily used for licensure decision and to which concordance can be calculated for titer value normalization. The effectiveness is also the result of both uptake (percent of individuals vaccinated) and efficacy (relative risk reduction in a 100% vaccinated group relative to placebo recipients, randomized from an appropriately representative population). It can further depend on resulting secondary effects that include the reduced force of infection such as through “herd immunity,” and on changes over time in vaccine uptake or relative prevalence of serotypes (and concomitant changes in cross-protection). Additionally, this method assumes that equivalent antibody concentrations elicited by different PCVs (e.g., PCV7, PCV10, PCV13) for a specific serotype yield equivalent levels of protection against disease caused by that serotype. Current data across different manufacturers suggest this is a reasonable assumption (i.e., it is consistent with available data) but it needs to be verified with post-licensure VE studies.

In addition to methodological limitations, there were data limitations. One-month post-primary infant placebo titer concentrations were unavailable from the United Kingdom, Australia, and Germany, as no efficacy study was run in these countries. Placebo data were instead used from a PCV7 trial done in the United States, as the population in this study was assumed to have infants, which elicit placebo immune responses that closely mirror placebo immune responses in the United Kingdom, Australia, and Germany. One-month post-primary vaccination for subjects given either PCV7 or PCV13 in a 3 + 0 regimen was also unavailable in the Australian population. Trials from the United States were used here because the primary infant series dosing regimen of 2, 4, and 6 months is the same as the Australian primary infant series dosing regimen represented in the 3 + 0 regimen, and the populations are assumed to have similar immune responses (IgG concentrations) to PCV7 and PCV13. Last, effectiveness needed to be used rather than efficacy, as randomized controlled trials were not run for the vaccines, regions, and time periods of interest.

The method described here can be used to calculate the serotype-specific protective concentrations of antibodies elicited by PCVs, as well their serotype-specific effectiveness. To qualify the method, we applied it to calculate protective concentrations and effectiveness of PCV13 in three different geographic locations (United Kingdom, Australia, and Germany) using each country’s respective PCV7 serotype-specific effectiveness as input, as well as immunogenicity data that reflected the dosing regimen used to estimate the PCV7 effectiveness. No serotype-specific effectiveness has been reported for PCV13 against PCV7 serotypes (4, 6B, 9 V, 14, 18 C, 19 F, and 23 F), but aggregate effectiveness was reported against these seven serotypes for PCV13, and this aggregate was compared to the predictions. The serotype-specific predictions were aggregated (weighting by relative incidence rates) and the aggregated results agreed with the previously reported data, qualifying the method.

Using currently available population-level data, the method can predict serotype-specific effectiveness in next-generation PCVs. As next-generation PCVs are developed in the coming years, it will be important to estimate the shared serotype-specific effectiveness to contextualize differences in immunogenicity between vaccines in terms of expected effectiveness and identify whether next-generation vaccines will maintain (or, possibly, improve) control of serotypes that are currently controlled well. The serotype-specific effectiveness predictions may also be useful in dynamic transmission modeling to assess the potential of breakthrough disease, especially in higher-risk persistent serotypes (e.g., 3 and 19 A in Europe3,17).


Study design and data sources

The two-step method is illustrated in Fig. 1. Step 1 is based on the method described by Siber et al.9 for calculating the protective antibody concentration Cp when vaccine efficacy/effectiveness is known. Step 2 involves the calculation of VE based on Cp. The method relies on previously reported serotype-specific values of VE for PCV7, and the concentration of antibodies raised against each of the PCV7 serotypes across placebo, PCV7, and PCV13 treated subjects one-month post-primary series. The one-month post-primary series was chosen as it (was used to derive and) is the timepoint used with the current correlate of protection (0.35 µg/mL) timepoint and reflects the immune response elicited within the first year of life when children are at the highest risk of IPD. The immunogenicity data are obtained from publicly available summaries of clinical trials, while the VE data are drawn from real-world evaluations of the vaccines (Table 4, and Supplemental Tables 1 and 2). PCV7 VE data were available from the United Kingdom10, Australia11, and Germany18. Immunogenicity data for PCV7 and PCV13 associated with the dosing regimens for the VE data inputs were obtained from the United Kingdom19,20 and Germany21. Australian 3 + 0 dosing regimen immunogenicity data for PCV7 and PCV13 were from a United States pediatric population9,22. Placebo data were also from a United States pediatric population9. The method was accordingly applied to data from the United Kingdom, Australia, and Germany.

Fig. 1: Modeling flow chart.
figure 1

The flow chart illustrates the two-step method for predicting the vaccine effectiveness (VE) of the PCV7 serotypes in PCV13. Beginning with the known serotype-specific IgG concentrations after vaccination with PCV7 (both placebo and active vaccine), simulated reverse cumulative distribution curves (RCDCs) are used, along with the known serotype-specific VE of PCV7 (where VE ≈ 1- (pv/pc)), (pv is the percentage of subjects with antibody levels less than protective antibody concentration (Cp) in the vaccinated cohort, and pc is the percentage of subjects with antibody levels less than Cp in the control cohort) to derive the Cp that makes 1-(pv/pc) agree with reported VE for each serotype in PCV7 (red arrow in the left panel). Then, RCDCs are simulated for the PCV7 serotypes using PCV13 recipients’ serotype-specific PCV immunogenicity data. The Cp values previously derived for these serotypes are used to estimate the VE for those serotypes (see arrows in right panel).

VE data used for comparison to the predicted PCV13 VE are drawn from real-world evaluations of PCV13 in each country10,11,12.

Derivation of the serotype-specific protective antibody concentration

In step 1, the known serotype-specific IgG concentrations after vaccination with PCV7 (both placebo and vaccine; Table 4) are used to simulate RCDCs, which plot the percentage of subjects at, or above, a given IgG concentration (Fig. 1). As subject-level IgG concentration data for placebo and PCV7 from the infant trials in the United Kingdom, Australia, and Germany were not available, individual IgG concentrations for placebo, PCV7, and PCV13 were simulated from summary-level published data (Table 4). The one-month post-primary series (for which the definition is country-specific) GMC and geometric standard deviation (GSD) for each serotype were used to simulate individual IgG concentrations from a log-normal distribution. The number of subjects simulated was based on the number of subjects in the respective vaccine treatment arm.

In addition to the serotype-specific IgG concentrations, Step 1 requires serotype-specific VE (Table 4) as input. As in Siber et al., VE is related to antibody concentration by the approximation:

$${{{\mathrm{VE}}}} \approx 1-\left( {{{{\mathrm{p}}}}_{{{\mathrm{v}}}}/{{{\mathrm{p}}}}_{{{\mathrm{c}}}}} \right),$$

where pv is the percentage of subjects with antibody levels less than Cp in the vaccinated cohort, pc is the percentage of subjects with antibody levels less than Cp in the control cohort, and Cp is the protective antibody concentration in µg/mL9. In this work, Eq. 1 uses the placebo and vaccinated RCDCs to give an estimate of VE at every IgG concentration C. Cp is the IgG concentration C at which that estimated VE is equal to the known (published) VE.

A number of simplifying assumptions were incorporated into step 1. The results assume that once the effects of antibody concentration are accounted for (by the protection model), the resulting efficacy and effectiveness are no longer dependent on the vaccination regimen (also known as “conditional independence”)23. They further assume that the relationship of the immune response and the probability of IPD is a step function (i.e., the probability of IPD is zero in subjects with serum antibody ≥Cp), that the antibody concentration measured 4 weeks after the primary series immunization of infants predicts up to 5-year effectiveness, and that the reported serotype-specific GMCs and variance from the trials for placebo reflect the true population GMC and variance. It was also assumed that the proportions of subjects who missed vaccine doses were similar between cases and controls in the PCV7 VE publications and that the distribution of IgG concentrations from the study was representative of typical antibody concentration 4 weeks after the primary immunization, regardless of adherence. Finally, the reported PCV7 and PCV13 serotype-specific VE values were obtained during an observation period ranging from 2000–2010 (for PCV7) and 2010–2016 (for PCV13). An assumption was made that VE and force of infection remained unchanged across serotypes over those periods. It was further assumed that vaccine efficacy and the value of the Cp did not change (e.g., due to serotype replacement, change of circulating types giving cross-protection, etc.).

Estimation of serotype-specific vaccine effectiveness

In step 2 (Fig. 1), Eq. 1 is applied to predict the VE of PCV13 for each serotype in common with PCV7. RCDCs are simulated for these serotypes based on the reported serotype-specific IgG concentrations in PCV13 recipients (active vaccine and placebo; Table 4). The Cp previously derived for these serotypes are then used to predict (using Eq. 1) the VE values for those serotypes. Here, it was assumed that IgG concentration is the only factor impacting VE for next-generation PCVs.

Statistical procedures and assessment of predicted vaccine effectiveness

For each serotype/vaccine combination, the published GMC, the published VE, and their GSDs (calculated from the 95% confidence intervals) were used as inputs for Monte Carlo simulation (10,000 iterations). Each simulation sampled a new GMC and VE from a log-normal distribution; uncertainty in the GSD was accounted for by generating GSD estimates assuming a chi-square distribution.

Each simulation first solved for a serotype-specific Cp based on the RCDCs for placebo and PCV7 and the published VE values for PCV7 (step 1). We then solved for the serotype-specific VE values of PCV13 based on the Cp from step 1 and the simulated RCDCs for placebo and PCV13 (step 2). Simulated Cp (µg/mL) and VE (%) were summarized as medians with 95% confidence intervals.

Because only aggregate (not serotype-specific) values of VE were available in the literature for qualification of our results, the serotype-specific values calculated by our method were combined with weighting into aggregate medians, 2.5%, and 97.5% bounds. Weighting was based on the relative incidence rate of IPD caused by each serotype in PCV7, obtained from the source publications on VE10,11,18. The predicted VE values and confidence intervals were compared with the published values and confidence intervals.

These analyses were performed in the R statistical software version 4.1.0.