Introduction

Streptococcus pneumoniae is a highly recombinant Gram positive bacterium which can be a commensal or pathogen in the respiratory tract1. Invasive pneumococcal disease (IPD) occurs when virulent strains enter the blood stream and cause a range of life-threatening syndromes, most importantly pneumonia, meningitis and sepsis. The burden of IPD is greatest in children under five years of age and the elderly, in whom it is associated with significant morbidity. Relatively few of the more than 90 serotypes of pneumococci are commonly associated with IPD and the polysaccharide capsule is considered the most important pneumococcal virulence factor2.

Antigens of the seven most common IPD-associated serotypes, including 19F, were included in the first polysaccharide-protein conjugate vaccine (PCV-7) that was publicly funded in 2005 for all Australian children under five years of age. The incidence of IPD declined dramatically, due to individual and herd immunity3,4,5, in countries where PCV-7 was widely used. In Australia IPD notifications in children under five fell from 55.4 to 16.8 cases per 100,000 population between 2004 and 20086,7. However, this success was tempered by an increase in the absolute numbers and proportions of non-vaccine serotypes causing IPD, particularly 19A. In New South Wales (NSW), the most populous state of Australia, the number and proportion of cases due to 19A increased from nine of 245 (4%) in 2004 to 56 of 99 (57%) in 20086. Similar increases in IPD incidence caused by serotype 19A were noted worldwide, often associated with high-level antibiotic resistance8,9,10. However, in Australia, most 19A IPD isolates were of intermediate penicillin resistance11.

The 13-valent polysaccharide-protein conjugate vaccine (PCV-13), which included serotype 19A antigen, replaced the PCV-7 vaccine in Australia in 2011. However, three years later in 2014, 27% (38/143: 29 serotype 19A; nine serotype 19F) of IPD cases in children under five in Australia were caused by serogroup 19, despite high vaccine coverage12. Pneumococcal conjugate vaccine failures are uncommon (2%), but those that occur are often due to serotype 19F13. This may be due to the relatively poor immunogenicity of the serotype 19F antigen14 and heterogeneity of the capsular (cps) biosynthesis locus in serogroup 19 which can reduce the spectrum of cross-protection provided by the serogroup 19 antigens contained in PCVs15,16.

Following introduction of PCV-7, ‘serotype switching’ or ‘capsular switch’ recombination was reported, particularly in serogroup 1917. These ‘capsular switch’ recombination events occurred when multi-drug resistant (MDR) clones recombined to switch their vaccine capsular (19F) to a non-vaccine, capsular type (19A)16,17,18,19. These recombination events were facilitated by genomic ‘hotspots’ on either side of the cps locus, which is flanked by the penicillin binding protein genes, pbp1a and pbp2x20, enabling pneumococci to evade both vaccine and antibiotic pressure. Recently, whole genome sequencing (WGS) has been used to examine this phenomenon seen as a potential threat to the long-term efficacy of pneumococcal vaccines21. WGS provides higher resolution strain typing and can identify recombination and genomic variability more effectively than multilocus sequence typing (MLST) or multilocus variable number of tandem repeats analysis (MLVA).

In this study, we applied WGS and genome-wide analysis to examine temporal diversity of serotypes 19F and 19A IPD isolates from children under five in NSW, Australia, in the year before (2004) introduction of PCV-7 and three years after the introduction of both PCV-7 and PCV-13 vaccines (2008 and 2014, respectively). In order to contextualize our analysis we also compared the genome sequences of Australian isolates with those of 152 serogroup 19 isolates from post-vaccine carriage studies in the UK and US16,22.

Results

Temporal changes in pneumococcal serotypes

Clinical data were available for 118 of 124 (94%) serogroup 19 IPD cases reported in NSW. Although PCV-7 was not publicly funded for all Australian children until January 2005, it was available for high-risk and Aboriginal and Torres Strait Islander children (a very small proportion of the population) from mid-2001. The number and distribution of serogroup 19 IPD isolates from children under five years of age in NSW between 2004 and 2016 are shown in Table 1. Antibiotic susceptibilities of isolates collected during the study period are summarised in Table 2 and related clinical data in Table 3.

Table 1 Serogroup 19 isolates from children <5 years of age, in NSW.
Table 2 Phenotypic antibiotic susceptibilities of serogroup 19 isolates from children <5 years of age, in NSW.
Table 3 Clinical and vaccination data from IPD patients, <5 years of age in NSW, from whom IPD serogroup 19 isolates were isolated.

The total number of IPD cases in NSW, including those due to serogroup 19, fell significantly after introduction of each vaccine. However, the proportion of serogroup 19 isolates increased after the introduction of PCV-7 with an important shift from 19F to 19A as the predominant serotype (Table 1), and an associated significant increase in the proportion of penicillin intermediate isolates (Table 2). There were no significant increases in the incidence of human influenza cases during the years of the study (Supplemental Fig. S1). The subsequent introduction of PCV-13 led to an overall decrease in the proportion of serogroup 19 isolates, despite a small, but significant, increase in the proportion of 19F isolates (p < 0.01). In 2014, serogroup 19 isolates accounted for 29% (23/80) of isolates and, of these, 39% (9/23) were from cases of vaccine failure, the high prevalence of vaccine failures due to serogroup 19 continued and increased in 2015 (68% (13/19)) and 2016 (80% (8/10)) (Table 1). There were no significant differences in the age, clinical syndromes or immunisation histories of children with serogroup 19 IPD diagnosed in the three time periods of the study (Table 3).

Core genome sequence clusters (cgSCs)

Nucleotide alignments of 1330 core genes identified from 277 assembled genomes (124 from NSW, 105 from USA, 47 from UK and ATCC reference strain 49619) were used to construct a maximum likelihood phylogeny (Fig. 1). Eleven cgSCs were identified, one polyphyletic cluster (cgSC1) and 10 monophyletic clusters (cgSCs 2–11). The distribution of cgSCs and their relationship to multilocus sequence types (STs) are shown in Table 4. The majority of monophyletic clusters contained both 19F and 19A isolates from all three years of the study; exceptions were cgSCs 8 and 9, which first appeared in 2008 and contained only 19A and cgSC7, which predominantly contained serotype 19A in post-vaccine isolates. There were 15 documented vaccine failure cases of IPD in fully vaccinated children (six attributed to 19F; nine to 19A), i.e. they had received three doses of either PCV-7 or PCV-13 (19F isolates) or PCV-13 (19A isolates) (Table 1). The 15 isolates were distributed among cgSCs 2, 3, 5, 7, 9 and 11 (Table 4 and Fig. 3). Phenotypic antibiotic susceptibilities where displayed on the core genome phylogeny, Australian MDR isolates were only contained in cgSC11 (Fig. 2).

Figure 1
figure 1

Maximum likelihood phylogeny of the 1330-gene core genome (cg) nucleotide alignment from 277 assembled genomes (124 from NSW, Australia, 105 from USA, 47 from UK and ATCC reference strain). Colour of isolate nodes denotes year collected and source (blue 2004; green 2008; orange 2014; white post-PCV carriage isolates from UK and USA and black ATCC Streptococcus pneumoniae strain 49619). Maximum likelihood phylogeny is separated into eleven sequence clusters (cgSCs) indicated by the colour of the first metadata block column. Serotypes of the isolates are denoted by the lower metadata block (black 19F; grey 19A).

Table 4 Core genome sequence clusters (cgSCs), multi-locus sequence types (STs) and serotypes of NSW IPD isolates in pre- and post-vaccine years.
Figure 2
figure 2

Phenotypic susceptibility of 277 serogroup 19 IPD isolates illustrated on the core genome phylogeny. Red nodes – multidrug resistant (MDR) isolates; i.e. phenotypically resistant to (MIC >2 µg/mL) to penicillin, ceftriaxone and erythromycin. Orange nodes – penicillin resistant isolates (MIC >2 µg/mL). Yellow nodes - penicillin intermediate resistant isolates (MIC 0.12–2 µg/mL). Green nodes - penicillin susceptible isolates (MIC <0.06 µg/mL g/L). White nodes - genomes from the international studies without phenotypic susceptibility data. The first column of metadata blocks depicts source of the isolate (black IPD isolates; grey international carriage isolates and purple ATCC Streptococcus pneumoniae strain 49619). The second column depicts year of isolation (blue 2004; green 2008; orange 2014; white post-PCV international carriage isolates and black ATCC Streptococcus pneumoniae strain 49619).

Multilocus sequence types (STs)

The 124 IPD isolates from NSW included 35 STs (two novel STs and one isolate with a ddl deletion), as shown in Table 4 with the corresponding cgSCs. Five cgSCs included single STs (cgSCs 2, 4, 6, 8 and 10). However the MLST did not accurately predict the homology of other cgSCs. No MLST alleles were shared between STs in polyphyletic cgSC1. The majority of isolates in cgSC11 were ST320 but other STs in this cluster did not share a common MLST allele. cgSCs 3, 5 and 7 shared a single common allele and STs in cgSC9 were triple locus variants (TLV). cgSC7 was the most diverse, containing eight distinct STs spread over all target years, which were predominantly serotype 19A in 2008.

The number of STs among serotype 19F isolates decreased along with overall numbers, following the introduction of PCV-7 (12 STs in 2004 versus two in 2008) while the number of 19A isolates and STs increased (seven STs in 2004 versus 13 in 2008). The number of STs detected in 2014 decreased again for serotype 19A isolates, from 13 in 2008 to 10 in 2014; STs detected for serotype 19F isolates increased from two in 2008 to four in 2014.

Amino acid variation within the cps loci

Maximum likelihood phylogeny of the 15 proteins encoded by the cps locus genes showed considerable heterogeneity between isolates of both serotypes, and this heterogeneity did not change between target years (Fig. 3). In serotype 19F, the greatest variation was in amino acid sequences of glucose phosphate transferase (WchA or CpsE). Low homology among amino acid sequences of dTDP-4-dehydrorhamnose reductase (RmlD) between serotype 19A isolates probably resulted from a recombination event that caused an inversion of rmlD. Amino acid sequences of the 15 vaccine failure isolates (six 19F; nine 19A) did not cluster within a particular phylogenetic clade or share any conserved non-synonymous changes within either serotype. Isolates with the same cps phylogeny also harboured highly diverse cgSC profiles. These findings indicate that vaccine failures are unlikely to be caused by a particular change in the vaccine antigens or an individual cgSC.

Figure 3
figure 3

Maximum likelihood phylogenetic analysis of protein sequences from the 15 genes that form the cps locus from isolates collected in 2004 (blue nodes), 2008 (green nodes), and 2014 (orange nodes). Figure 3A demonstrates the amino acid sequence diversity of serotype 19A isolates and serotype 19F are depicted in Fig. 3B. Outer rings show core genome sequence clusters (cgSC) of isolates (see Fig. 1A). Isolates with the same cps phylogeny harbour highly diverse cgSC profiles. For both serogroups homology is noted between isolates collected in all years of the study. Stars indicate cases of vaccine failure, where subjects had received 3 doses of PCV-7 and/or PCV-13 (19F) and PCV-13 (19A).

Discussion

In this study we have shown that the evolution of serogroup 19, after the introduction of PCV vaccines was different in Australia, from that described in some other countries5,17,23 and was associated with less antibiotic resistance11. Our genomics data suggests that serotype 19 antigen heterogeneity may contribute to vaccine failure, due to serogroup 19 but the high rate, in Australia cannot be explained by a single serotype 19A or 19F antigen variant or common cgSC.

Although widespread uptake of PCVs have significantly reduced the incidence of IPD in children under five years of age in Australia and elsewhere, serogroup 19 remained the predominant cause of IPD in NSW (13 of 62 [21%] cases in 2015–16; Table 1), despite the fact that 93.5% of Australian children were fully vaccinated with PCV-7 and/or PCV-13, by 5 years of age in 201524. The success of PCV-7 vaccine was partially offset by the rapid emergence of serotype 19A; its incidence fell and that of serotype 19F increased, after the introduction of PCV-13. Our findings indicate that the increased prevalence of intermediate penicillin resistance in serotype 19A11, post PCV-7, was largely due to emergence of cgSC8/ST63 and cgSC9/ST2345 and expansion of cgSC7/ST199, nearly all of which had intermediate penicillin resistance. Although cgSC8 and 9 were prominent among NSW isolates, particularly in 2008, they were rarely detected, if at all, among published isolates, from USA and UK, with which we compared them (cgSC8 - USA 2/106; UK 0/48; cgSC9 - USA 0/106; UK0/48). Rather, serotype 19A, cgSCs 7 and 11 were the predominant post-PCV-7 clones in the UK and USA, including high-level penicillin resistant clones cgSC7 (ST695) and MDR clone cgSC11 (ST320). There was only a single ST695 (cgSC6) isolate among our study isolates and, unlike those from the USA, it was not penicillin resistant. We note that USA and UK isolates in that comparison were colonising, rather than IPD strains. However, it is unlikely that this difference would affect strain distribution significantly and our findings are consistent with reported studies of post-PCV IPD serotype distribution in UK and USA10,25,26. Both cgSC8 and 9 decreased substantially after the introduction of PCV-13, associated with a fall in overall intermediate penicillin resistance.

Following the removal of highly successful vaccine serotype clones, in the northern hemisphere, the prevalence of a previously unrecognised clone MDR 19A/ST320 (cgSC11 in our study)8,25,26,27 increased. This was apparently due to a capsular switch from 19F to 19A, which probably occurred independently of vaccine use15,16,28. In contrast, although MDR 19F/cgSC11/ST320 was present in NSW before the introduction of PCV-7, as shown in this study, its prevalence did not increase afterwards. The prominent 2008 serotype 19A clones in NSW, cgSC8/ST63 and cgSC9/ST2345 (Fig. 1) have intermediate, rather than high level, penicillin resistance. We identified small numbers of ‘capsular switch’ 19A/ST320 variants and closely related STs in cgSC11, in both 2008 and 2014 (Table 4). The phylogeny of cgSC11 illuminated the post-PCV emergence of serotype 19A pneumococci: serotype 19F isolates are contained in ancestral branches and serotype 19A in more divergent branches, which is consistent with all pre-PCV-7 cgSC11 isolates expressing serotype 19F, whereas most post-PCV cgSC11 isolates were 19A. Recent population studies of S. pneumoniae suggest that the rate at which recombination occurs can differ between strains; the highest recombination rate among all encapsulated pneumococci is in ST320/cgSC11, in which transformation events are predicted to occur every one to two years16.

Antibiotic pressure can be a strong driver of recombination in pneumococci whereby pbp genes in penicillin susceptible strains are replaced by pbp genes carrying mutations that confer penicillin resistance, in the presence of high antibiotic consumption20. Several other countries, including Germany and Norway, which both have relatively low antibiotic usage29, have also reported a relatively low MDR ST320 prevalence, raising the possibility that its prevalence could be related to national antibiotic usage9,30. However, this is an unlikely explanation of the low prevalence in Australia, where the antibiotic prescribing rates in the community are higher than in many European countries29, including France, which reported expansion of the MDR 19A/ST320 clone31,32.

Most cases of IPD due to serotype 19F in 2008 and 2014, were vaccine failures including three of the four in 2008 and three of seven cases in 2014. A recent systematic review suggested that vaccine failures following PCV-7, were rare, overall (2.1%), but 38% of them were caused by serotype 19F and commonly occurring in children with underlying co-morbidities (42.9%)13. The vaccine failure rate due to 19F in our cohorts of children under 5 years was comparable to that in other studies (3% in 2008; 4% in 2014). Fewer international studies have reported vaccine failure rates after introduction of PCV-13. In our study, 11% (nine of 80) of IPD cases in 2014, were vaccine failures due to serotype 19A. In Australia, overall, 23% (32 of 141) of IPD cases, in 2014, were vaccines failures and, of these, half (16; 11% of total cases) were due to serotype 19A; the other main causes were serotypes 3 (10; 7%) and 19F (four; 3%)12. Vaccine failure rates in Australia exceeded post-PCV rates reported in France and Spain (<3%)33,34. Unlike other PCV vaccine failure studies, our data indicated that only three of 15 vaccine failures were in children with underlying co-morbidities.

The high vaccine failure rate due to serogroup 19 could be associated with heterogeneity of the serogroup 19 cps locus35 which may mean that immune responses to 19F and 19A strains used in PCV vaccines do not protect against all serogroup 19 stains. Diversity among cps loci found in this study was largely due to recombination within the 19A rml operon and 19F wchA gene, both of which have been demonstrated in serogroup 19 previously and are not thought to increase virulence or fitness28,36. Such within-serogroup recombinations have been recently reported to be more common than ‘capsular switching’ where the entire cps locus is replaced28. However, isolates from vaccine failures, in this study, did not share common amino acid sequence changes in serotype antigens or a common cgSC. This suggests that, while pathogen heterogeneity may contribute to vaccine failure, post-PCV evolution cannot fully not explain the increased incidence of vaccine failures due to serogroup 19. On the other hand, higher anticapsular antibody titres are required for efficient opsonophagocytosis of serotype 19F strains37,38. This could explain the higher vaccine failure rate associated with Australia’s three-dose vaccine schedule compared with those in countries with the more usual four-dose schedule39.

In conclusion, this is the first comparison of IPD in Australia before and after introduction of PCV using next generation sequencing. Although it was limited to a single Australian state (NSW has approximately one third of Australia’s population) and selected years, our findings have shown significant evolution of serogroup 19 in response to introduction of pneumococcal conjugate vaccines, which differs from that reported in some other countries. Substantial replacement of serotype 19F with serotype 19A occurred, following introduction of PCV-7, due to expansion of predominantly penicillin intermediate clones (mainly cgSCs 7, 8 and 9), and largely disappeared after the introduction of PCV13. By contrast, in the USA and UK, emergence of serotype 19A was associated with high-level penicillin resistance (mainly cgSCs 7 and 11). We have documented higher rates of vaccine failure due to serogroup 19 than previously described elsewhere. While this may be partly attributed to a high level of diversity among serogroup 19 cps loci, we hypothesise that Australia’s three-dose PCV-13 vaccine schedule, rather than the three-dose plus booster schedule used in many other countries, may have played a major role.

Materials and Methods

Source of isolates

All serogroup 19 IPD isolates from children under 5 years of age, referred by public and private pathology providers to the NSW Pneumococcal Reference Laboratory (PRL) in Sydney for public health surveillance of pneumococcal disease including serotyping and molecular typing of isolates. Historical isolates collected in the years 2004, 2008 and 2014 were included in the study. The ATCC serotype 19F strain 49619 was sequenced as the reference. Two isolates were excluded from analysis in each of 2008 and 2014 sets (i.e. two isolates could not be located, and two isolates had an incorrect serotyping result).

Clinical and demographic data

Data for each case, including age, sex, diagnosis, specimen type, vaccination history and comorbidities were retrieved from NSW Health Notifiable Conditions Information Management System (NCIMS), Communicable Diseases Branch, Health Protection NSW. Vaccine failure cases where defined as IPD occurring in children who had received three doses of PCV-7 or PCV-13 (for vaccine failures due to 19F) or PCV-13 (19A). The study was approved by the NSW Population & Health Services Research Ethics Committee (HREC/18/CIPHS/28). Descriptive statistics where conducted using a two tailed Fisher’s exact test to measure variance in IPD cases over study years.

Serotyping and phenotypic antibiotic susceptibility testing

Serotyping was performed by Neufeld’s Quelling test, using pool, type and factor specific antisera (Statens Serum Institut, Copenhagen, Denmark). Minimum inhibitory concentrations (MICs) for penicillin, ceftriaxone and erythromycin where measured using broth microdilution and interpreted according to EUCAST criteria11. Multidrug resistant (MDR) isolates were defined as those that were phenotypically resistant to penicillin (MIC >2 mg/L), ceftriaxone and erythromycin.

Nucleic acid extraction and library preparation

Isolates were thawed from −80 °C storage in STGG (skim milk-tryptone-glucose-glycerin) and cultured on horse blood agar overnight at 37 °C, 5% CO2. Subculture was performed to ensure purity. A McFarland number 3 suspension was prepared in nucleic acid free H20 for DNA extraction. Extraction was performed in accordance with manufacturer’s instructions using the Blood and Tissue Mini Kit (Qiagen, Australia) for Gram positive bacteria as per the manufacturer’s instructions. DNA extracts were treated with 1 Unit of RNase. Total DNA concentration was quantified using Picogreen (Invitrogen, Australia) and 1 ng/µL of DNA was used to prepared DNA libraries in accordance with manufacturer’s instructions employing the Nextera XT Library Preparation Kit. Multiplexed libraries were sequenced using paired-end 150-bp chemistry on the NextSeq. 500 (Illumina, Australia). All methods were carried out in accordance with The University of Sydney’s institutional guidelines and regulations.

International IPD genomes

Fastq files were downloaded from two international studies that investigated the changes in pneumococcal population structure after the introduction of PCV-7 and PCV-13 vaccination in the UK and USA16,22. A total of 152 serotype 19 carriage isolates were included in our study (47 and 105 isolates from the UK and the USA studies, respectively); raw reads were processed through the bioinformatics pipeline as described below (European Nucleotide Archive study accessions: PRJEB2417 and PRJEB2632).

Bioinformatic analysis

De-multiplexed sequencing reads were trimmed, based on a minimum quality read score of 20, and trimmed40 reads were de novo assembled using SPAdes41. The quality of de novo assemblies was accessed with Quast42, only contigs over 1000-bp in length with a minimum coverage of 50 reads were included in further analysis. Final contigs were annotated with Prokka43. MLST was inferred from final contigs using the S. pneumonia MLST scheme (http://pubmlst.org/spneumoniae/). Core genome analysis was conducted using Roary: the pan genome pipeline44. Maximum-likelihood phylogeny of the pan genome was assessed with RAxML with 100 bootstrap replicates45,46. Core genome alignments were employed to cluster isolates using the hierarchical Bayesian Analysis of Population Structure (hierBAPS) software47. Phylogeny and metadata were visualised with Microreact and Inkscape48. Raw sequencing reads were deposited in the European Nucleotide Archive; project accession PRJEB28571(Supplemental Table 1).

Genomic variation in the pneumococcal capsular polysaccharides

The amino acid sequence of 15 genes between the putative transcriptional regulator YwtF (also known as wzg or cpsA) and the dTDP-4-dehydrohamnose reductase gene (strL or rfbD) were extracted from contigs using BLAST+ and defined as the capsular biosynthesis locus that encodes the pneumococcal capsular polysaccharide (cps)49. The sequence of cps genes were concatenated and aligned using the multiple sequence alignment tool MAFFT. Maximum likelihood phylogeny was constructed by RAxML software with 100 bootstrap replicates45,46. Visualization and comparison of potential recombination sites was performed using BratNextGen50.