Genome-wide analysis of Streptococcus pneumoniae serogroup 19 in the decade after the introduction of pneumococcal conjugate vaccines in Australia

The decline in invasive pneumococcal disease (IPD), following the introduction of the 7-valent pneumococcal conjugate vaccination (PCV-7), was tempered by emergence of non-vaccine serotypes, particularly 19A. In Australia, three years after PCV-7 was replaced by PCV-13, containing 19A and 19F antigens, serogroup 19 was still a prominent cause of IPD in children under five. In this study we examined the evolution of serogroup 19 before and after introduction of paediatric vaccines in New South Wales (NSW), Australia. Genomes of 124 serogroup 19 IPD isolates collected before (2004) and after introduction of PCV-7 (2008) and PCV-13 (2014), from children under five in NSW, were analysed. Eleven core genome sequence clusters (cgSC) and 35 multilocus sequence types (ST) were identified. The majority (78/124) of the isolates belonged to four cgSCs: cgSC7 (ST199), cgSC11 (ST320), cgSC8 (ST63) and cgSC9 (ST2345). ST63 and ST2345 were exclusively serotype 19A and accounted for its predominantly intermediate penicillin resistance; these two clusters first appeared in 2008 and largely disappeared after introduction of PCV-13. Serogroup 19 was responsible for the highest proportion of vaccine failures in NSW. Relatively low immunogenicity of serogroup 19 antigens and Australia’s three-dose vaccine schedule could affect the population dynamics of this serogroup.

19A were noted worldwide, often associated with high-level antibiotic resistance [8][9][10] . However, in Australia, most 19A IPD isolates were of intermediate penicillin resistance 11 . The 13-valent polysaccharide-protein conjugate vaccine (PCV-13), which included serotype 19A antigen, replaced the PCV-7 vaccine in Australia in 2011. However, three years later in 2014, 27% (38/143: 29 serotype 19A; nine serotype 19F) of IPD cases in children under five in Australia were caused by serogroup 19, despite high vaccine coverage 12 . Pneumococcal conjugate vaccine failures are uncommon (2%), but those that occur are often due to serotype 19F 13 . This may be due to the relatively poor immunogenicity of the serotype 19F antigen 14 and heterogeneity of the capsular (cps) biosynthesis locus in serogroup 19 which can reduce the spectrum of cross-protection provided by the serogroup 19 antigens contained in PCVs 15,16 .
Following introduction of PCV-7, 'serotype switching' or 'capsular switch' recombination was reported, particularly in serogroup 19 17 . These 'capsular switch' recombination events occurred when multi-drug resistant (MDR) clones recombined to switch their vaccine capsular (19F) to a non-vaccine, capsular type (19A) [16][17][18][19] . These recombination events were facilitated by genomic 'hotspots' on either side of the cps locus, which is flanked by the penicillin binding protein genes, pbp1a and pbp2x 20 , enabling pneumococci to evade both vaccine and antibiotic pressure. Recently, whole genome sequencing (WGS) has been used to examine this phenomenon seen as a potential threat to the long-term efficacy of pneumococcal vaccines 21 . WGS provides higher resolution strain typing and can identify recombination and genomic variability more effectively than multilocus sequence typing (MLST) or multilocus variable number of tandem repeats analysis (MLVA).
In this study, we applied WGS and genome-wide analysis to examine temporal diversity of serotypes 19F and 19A IPD isolates from children under five in NSW, Australia, in the year before (2004) introduction of PCV-7 and three years after the introduction of both PCV-7 and PCV-13 vaccines (2008 and 2014, respectively). In order to contextualize our analysis we also compared the genome sequences of Australian isolates with those of 152 serogroup 19 isolates from post-vaccine carriage studies in the UK and US 16,22 .

Results
Temporal changes in pneumococcal serotypes. Clinical data were available for 118 of 124 (94%) serogroup 19 IPD cases reported in NSW. Although PCV-7 was not publicly funded for all Australian children until January 2005, it was available for high-risk and Aboriginal and Torres Strait Islander children (a very small proportion of the population) from mid-2001. The number and distribution of serogroup 19 IPD isolates from children under five years of age in NSW between 2004 and 2016 are shown in Table 1. Antibiotic susceptibilities of isolates collected during the study period are summarised in Table 2 and related clinical data in Table 3.
The total number of IPD cases in NSW, including those due to serogroup 19, fell significantly after introduction of each vaccine. However, the proportion of serogroup 19 isolates increased after the introduction of PCV-7 with an important shift from 19F to 19A as the predominant serotype (Table 1), and an associated significant increase in the proportion of penicillin intermediate isolates (Table 2). There were no significant increases in the incidence of human influenza cases during the years of the study (Supplemental Fig. S1). The subsequent introduction of PCV-13 led to an overall decrease in the proportion of serogroup 19 isolates, despite a small, but significant, increase in the proportion of 19F isolates (p < 0.01). In 2014, serogroup 19 isolates accounted for 29% (23/80) of isolates and, of these, 39% (9/23) were from cases of vaccine failure, the high prevalence of vaccine failures due to serogroup 19 continued and increased in 2015 (68% (13/19)) and 2016 (80% (8/10)) ( Table 1). There were no significant differences in the age, clinical syndromes or immunisation histories of children with serogroup 19 IPD diagnosed in the three time periods of the study (Table 3).    (Table 4 and Fig. 3). Phenotypic antibiotic susceptibilities where displayed on the core genome phylogeny, Australian MDR isolates were only contained in cgSC11 (Fig. 2).

Multilocus sequence types (STs). The 124 IPD isolates from NSW included 35 STs (two novel STs and
one isolate with a ddl deletion), as shown in Table 4 with the corresponding cgSCs. Five cgSCs included single STs (cgSCs 2, 4, 6, 8 and 10). However the MLST did not accurately predict the homology of other cgSCs. No MLST alleles were shared between STs in polyphyletic cgSC1. The majority of isolates in cgSC11 were ST320 but other STs in this cluster did not share a common MLST allele. cgSCs 3, 5 and 7 shared a single common allele and STs in cgSC9 were triple locus variants (TLV). cgSC7 was the most diverse, containing eight distinct STs spread over all target years, which were predominantly serotype 19A in 2008. The number of STs among serotype 19F isolates decreased along with overall numbers, following the introduction of PCV-7 ( Amino acid variation within the cps loci. Maximum likelihood phylogeny of the 15 proteins encoded by the cps locus genes showed considerable heterogeneity between isolates of both serotypes, and this heterogeneity did not change between target years (Fig. 3). In serotype 19F, the greatest variation was in amino acid sequences of glucose phosphate transferase (WchA or CpsE). Low homology among amino acid sequences of dTDP-4-dehydrorhamnose reductase (RmlD) between serotype 19A isolates probably resulted from a recombination event that caused an inversion of rmlD. Amino acid sequences of the 15 vaccine failure isolates (six 19F; nine 19A) did not cluster within a particular phylogenetic clade or share any conserved non-synonymous changes within either serotype. Isolates with the same cps phylogeny also harboured highly diverse cgSC profiles. These findings indicate that vaccine failures are unlikely to be caused by a particular change in the vaccine antigens or an individual cgSC.

Discussion
In this study we have shown that the evolution of serogroup 19, after the introduction of PCV vaccines was different in Australia, from that described in some other countries 5,17,23 and was associated with less antibiotic resistance 11 . Our genomics data suggests that serotype 19 antigen heterogeneity may contribute to vaccine failure, due to serogroup 19 but the high rate, in Australia cannot be explained by a single serotype 19A or 19F antigen variant or common cgSC.
Although widespread uptake of PCVs have significantly reduced the incidence of IPD in children under five years of age in Australia and elsewhere, serogroup 19 remained the predominant cause of IPD in NSW (13 of 62 [21%] cases in 2015-16; Table 1), despite the fact that 93.5% of Australian children were fully vaccinated with PCV-7 and/or PCV-13, by 5 years of age in 2015 24 . The success of PCV-7 vaccine was partially offset by the rapid emergence of serotype 19A; its incidence fell and that of serotype 19F increased, after the introduction of PCV-13. Our findings indicate that the increased prevalence of intermediate penicillin resistance in serotype 19A 11 , post PCV-7, was largely due to emergence of cgSC8/ST63 and cgSC9/ST2345 and expansion of cgSC7/ST199, nearly all of which had intermediate penicillin resistance. Although cgSC8 and 9 were prominent among NSW isolates, particularly in 2008, they were rarely detected, if at all, among published isolates, from USA and UK, with which we compared them (cgSC8 -USA 2/106; UK 0/48; cgSC9 -USA 0/106; UK0/48). Rather, serotype 19A, cgSCs 7 and 11 were the predominant post-PCV-7 clones in the UK and USA, including high-level penicillin resistant clones cgSC7 (ST695) and MDR clone cgSC11 (ST320). There was only a single ST695 (cgSC6) isolate among our study isolates and, unlike those from the USA, it was not penicillin resistant. We note that USA and UK isolates in that comparison were colonising, rather than IPD strains. However, it is unlikely that this difference would affect strain distribution significantly and our findings are consistent with reported studies of post-PCV IPD serotype distribution in UK and USA 10,25,26 . Both cgSC8 and 9 decreased substantially after the introduction of PCV-13, associated with a fall in overall intermediate penicillin resistance. Following the removal of highly successful vaccine serotype clones, in the northern hemisphere, the prevalence of a previously unrecognised clone MDR 19A/ST320 (cgSC11 in our study) 8,25-27 increased. This was apparently due to a capsular switch from 19F to 19A, which probably occurred independently of vaccine use 15  contrast, although MDR 19F/cgSC11/ST320 was present in NSW before the introduction of PCV-7, as shown in this study, its prevalence did not increase afterwards. The prominent 2008 serotype 19A clones in NSW, cgSC8/ ST63 and cgSC9/ST2345 (Fig. 1) have intermediate, rather than high level, penicillin resistance. We identified small numbers of 'capsular switch' 19A/ST320 variants and closely related STs in cgSC11, in both 2008 and 2014 ( Table 4). The phylogeny of cgSC11 illuminated the post-PCV emergence of serotype 19A pneumococci: serotype 19F isolates are contained in ancestral branches and serotype 19A in more divergent branches, which is consistent with all pre-PCV-7 cgSC11 isolates expressing serotype 19F, whereas most post-PCV cgSC11 isolates were 19A. Recent population studies of S. pneumoniae suggest that the rate at which recombination occurs can differ between strains; the highest recombination rate among all encapsulated pneumococci is in ST320/cgSC11, in which transformation events are predicted to occur every one to two years 16 . Antibiotic pressure can be a strong driver of recombination in pneumococci whereby pbp genes in penicillin susceptible strains are replaced by pbp genes carrying mutations that confer penicillin resistance, in the presence of high antibiotic consumption 20 . Several other countries, including Germany and Norway, which both have relatively low antibiotic usage 29 , have also reported a relatively low MDR ST320 prevalence, raising the possibility that its prevalence could be related to national antibiotic usage 9,30 . However, this is an unlikely explanation of the low prevalence in Australia, where the antibiotic prescribing rates in the community are higher than in many European countries 29 , including France, which reported expansion of the MDR 19A/ST320 clone 31,32 .  Figure 3A demonstrates the amino acid sequence diversity of serotype 19A isolates and serotype 19F are depicted in Fig. 3B. Outer rings show core genome sequence clusters (cgSC) of isolates (see Fig. 1A). Isolates with the same cps phylogeny harbour highly diverse cgSC profiles. For both serogroups homology is noted between isolates collected in all years of the study. Stars indicate cases of vaccine failure, where subjects had received 3 doses of PCV-7 and/or PCV-13 (19F) and PCV-13 (19A). Most cases of IPD due to serotype 19F in 2008 and 2014, were vaccine failures including three of the four in 2008 and three of seven cases in 2014. A recent systematic review suggested that vaccine failures following PCV-7, were rare, overall (2.1%), but 38% of them were caused by serotype 19F and commonly occurring in children with underlying co-morbidities (42.9%) 13 . The vaccine failure rate due to 19F in our cohorts of children under 5 years was comparable to that in other studies (3% in 2008; 4% in 2014). Fewer international studies have reported vaccine failure rates after introduction of PCV-13. In our study, 11% (nine of 80) of IPD cases in 2014, were vaccine failures due to serotype 19A. In Australia, overall, 23% (32 of 141) of IPD cases, in 2014, were vaccines failures and, of these, half (16; 11% of total cases) were due to serotype 19A; the other main causes were serotypes 3 (10; 7%) and 19F (four; 3%) 12 . Vaccine failure rates in Australia exceeded post-PCV rates reported in France and Spain (<3%) 33,34 . Unlike other PCV vaccine failure studies, our data indicated that only three of 15 vaccine failures were in children with underlying co-morbidities.
The high vaccine failure rate due to serogroup 19 could be associated with heterogeneity of the serogroup 19 cps locus 35 which may mean that immune responses to 19F and 19A strains used in PCV vaccines do not protect against all serogroup 19 stains. Diversity among cps loci found in this study was largely due to recombination within the 19A rml operon and 19F wchA gene, both of which have been demonstrated in serogroup 19 previously and are not thought to increase virulence or fitness 28,36 . Such within-serogroup recombinations have been recently reported to be more common than 'capsular switching' where the entire cps locus is replaced 28 . However, isolates from vaccine failures, in this study, did not share common amino acid sequence changes in serotype antigens or a common cgSC. This suggests that, while pathogen heterogeneity may contribute to vaccine failure, post-PCV evolution cannot fully not explain the increased incidence of vaccine failures due to serogroup 19. On the other hand, higher anticapsular antibody titres are required for efficient opsonophagocytosis of serotype 19F strains 37,38 . This could explain the higher vaccine failure rate associated with Australia's three-dose vaccine schedule compared with those in countries with the more usual four-dose schedule 39 .
In conclusion, this is the first comparison of IPD in Australia before and after introduction of PCV using next generation sequencing. Although it was limited to a single Australian state (NSW has approximately one third of Australia's population) and selected years, our findings have shown significant evolution of serogroup 19 in response to introduction of pneumococcal conjugate vaccines, which differs from that reported in some other countries. Substantial replacement of serotype 19F with serotype 19A occurred, following introduction of PCV-7, due to expansion of predominantly penicillin intermediate clones (mainly cgSCs 7, 8 and 9), and largely disappeared after the introduction of PCV13. By contrast, in the USA and UK, emergence of serotype 19A was associated with high-level penicillin resistance (mainly cgSCs 7 and 11). We have documented higher rates of vaccine failure due to serogroup 19 than previously described elsewhere. While this may be partly attributed to a high level of diversity among serogroup 19 cps loci, we hypothesise that Australia's three-dose PCV-13 vaccine schedule, rather than the three-dose plus booster schedule used in many other countries, may have played a major role.

Materials and Methods
Source of isolates. All serogroup 19 IPD isolates from children under 5 years of age, referred by public and private pathology providers to the NSW Pneumococcal Reference Laboratory (PRL) in Sydney for public health surveillance of pneumococcal disease including serotyping and molecular typing of isolates. Historical isolates collected in the years 2004, 2008 and 2014 were included in the study. The ATCC serotype 19F strain 49619 was sequenced as the reference. Two isolates were excluded from analysis in each of 2008 and 2014 sets (i.e. two isolates could not be located, and two isolates had an incorrect serotyping result).
Clinical and demographic data. Data  Serotyping and phenotypic antibiotic susceptibility testing. Serotyping was performed by Neufeld's Quelling test, using pool, type and factor specific antisera (Statens Serum Institut, Copenhagen, Denmark). Minimum inhibitory concentrations (MICs) for penicillin, ceftriaxone and erythromycin where measured using broth microdilution and interpreted according to EUCAST criteria 11 . Multidrug resistant (MDR) isolates were defined as those that were phenotypically resistant to penicillin (MIC >2 mg/L), ceftriaxone and erythromycin.
Nucleic acid extraction and library preparation. Isolates were thawed from −80 °C storage in STGG (skim milk-tryptone-glucose-glycerin) and cultured on horse blood agar overnight at 37 °C, 5% CO 2 . Subculture was performed to ensure purity. A McFarland number 3 suspension was prepared in nucleic acid free H 2 0 for DNA extraction. Extraction was performed in accordance with manufacturer's instructions using the Blood and Tissue Mini Kit (Qiagen, Australia) for Gram positive bacteria as per the manufacturer's instructions. DNA extracts were treated with 1 Unit of RNase. Total DNA concentration was quantified using Picogreen (Invitrogen, Australia) and 1 ng/µL of DNA was used to prepared DNA libraries in accordance with manufacturer's instructions employing the Nextera XT Library Preparation Kit. Multiplexed libraries were sequenced using paired-end 150-bp chemistry on the NextSeq. 500 (Illumina, Australia). All methods were carried out in accordance with The University of Sydney's institutional guidelines and regulations. International IPD genomes. Fastq files were downloaded from two international studies that investigated the changes in pneumococcal population structure after the introduction of PCV-7 and PCV-13 vaccination in the UK and USA 16,22 . A total of 152 serotype 19 carriage isolates were included in our study (47 and 105 isolates from the UK and the USA studies, respectively); raw reads were processed through the bioinformatics pipeline as described below (European Nucleotide Archive study accessions: PRJEB2417 and PRJEB2632).

Bioinformatic analysis.
De-multiplexed sequencing reads were trimmed, based on a minimum quality read score of 20, and trimmed 40 reads were de novo assembled using SPAdes 41 . The quality of de novo assemblies was accessed with Quast 42 , only contigs over 1000-bp in length with a minimum coverage of 50 reads were included in further analysis. Final contigs were annotated with Prokka 43 . MLST was inferred from final contigs using the S. pneumonia MLST scheme (http://pubmlst.org/spneumoniae/). Core genome analysis was conducted using Roary: the pan genome pipeline 44 . Maximum-likelihood phylogeny of the pan genome was assessed with RAxML with 100 bootstrap replicates 45,46 . Core genome alignments were employed to cluster isolates using the hierarchical Bayesian Analysis of Population Structure (hierBAPS) software 47 . Phylogeny and metadata were visualised with Microreact and Inkscape 48 . Raw sequencing reads were deposited in the European Nucleotide Archive; project accession PRJEB28571(Supplemental Table 1).
Genomic variation in the pneumococcal capsular polysaccharides. The amino acid sequence of 15 genes between the putative transcriptional regulator YwtF (also known as wzg or cpsA) and the dTDP-4-dehydrohamnose reductase gene (strL or rfbD) were extracted from contigs using BLAST+ and defined as the capsular biosynthesis locus that encodes the pneumococcal capsular polysaccharide (cps) 49 . The sequence of cps genes were concatenated and aligned using the multiple sequence alignment tool MAFFT. Maximum likelihood phylogeny was constructed by RAxML software with 100 bootstrap replicates 45,46 . Visualization and comparison of potential recombination sites was performed using BratNextGen 50 .