Bacterial genome-wide association study of hyper-virulent pneumococcal serotype 1 identifies genetic variation associated with neurotropism

Hyper-virulent Streptococcus pneumoniae serotype 1 strains are endemic in Sub-Saharan Africa and frequently cause lethal meningitis outbreaks. It remains unknown whether genetic variation in serotype 1 strains modulates tropism into cerebrospinal fluid to cause central nervous system (CNS) infections, particularly meningitis. Here, we address this question through a large-scale linear mixed model genome-wide association study of 909 African pneumococcal serotype 1 isolates collected from CNS and non-CNS human samples. By controlling for host age, geography, and strain population structure, we identify genome-wide statistically significant genotype-phenotype associations in surface-exposed choline-binding (P = 5.00 × 10−08) and helicase proteins (P = 1.32 × 10−06) important for invasion, immune evasion and pneumococcal tropism to CNS. The small effect sizes and negligible heritability indicated that causation of CNS infection requires multiple genetic and other factors reflecting a complex and polygenic aetiology. Our findings suggest that certain pathogen genetic variation modulate pneumococcal survival and tropism to CNS tissue, and therefore, virulence for meningitis.

I am by no-means an expert in the statistical and bioinformatic methods used, however it appears to me that they are appropriate and have been carefully controlled to reduce confounding influences. A significant amount of detail and supporting data is included which should allow other researchers to replicate the work. Limitations of the study are clearly discussed, for example the possibility of non-CNS isolates simply being "in transit" to the CNS. I only have a few minor points to raise. Figure 1a, while a nice way of presenting the study design, I'm unsure whether it shows anything that is not already described in the text. To me it is unclear whether the sizes or positions of the squares are meant to represent anything??
Line 209 references Figure 3A. the paragraph is discussing significant unitig associated with CNS isolates but the figure referenced is the number of snps vs clade, the figure reference is meant to be 4A. Figure 4A. Perhaps does not need the x-axis scale as at first I thought,because it was labelled unitig (ID), that these were the ID numbers. But actually they are just a count of the unordered unitigs and the (ID) was referring to the annotated ID numbers. Table 1 header -should it be unitigs rather than SNPS?
Line 359 -should refer to Table 2. After a careful and critical reading of this research study, it was easy to understand that the authors (Chaguza, et al.) are reporting for the first time important genetic associations modulating the tropism of the hyper-virulent Streptococcus pneumoniae serotype 1 into cerebrospinal fluid to cause central nervous system (CNS) infections, particularly meningitis. The authors compared 909 pneumococcal serotype 1 strains isolated from CNS (297) and non-CNS (612) human specimens, aiming to determine whether the presence of genetic variation(s) may contribute to our understanding of the ability of the pneumococcus to translocate across the bloodbrain-barrier to cause meningitis (neurotropism). The manuscript provides solid GWAS data and analysis of these pneumococcal serotype 1 isolates collected in sub-Saharan countries during 20 years . However, the impact of pneumococcal conjugate vaccine (PCV) introduction (selective pressure) is not considered in the analysis. Serotype 1 is included in the PCV10 and PCV13 but was not included in the precedent PCV7 formulation. Is there any metadata about the vaccination status (post-PCV period) of the participants from were pneumococci were recovered? Moreover, the most relevant result links a genetic variant (Unitig 8805) located in the PspC (CbpA, SpsA) coding region of the pneumococcal serotype 19A (Hungary 19A-14) genome with neurotropism, which is particularly in good agreement with the previous work of the co-author Dr. Aras Kadioglu(1) and other scientists in the field worldwide (2)(3)(4)(5)(6). A (supplementary) figure depicting the capsule 1 operon and the most relevant unitigs on a serotype 1 representative genome may be helpful. To highlight the pspC gene, PspC protein, Proline-rich domain and the unitig 8805 would contribute to a better understanding of the discussion at a molecular level. Especially on how human antibodies would have protective effects and how its presence/absence can be beneficial/detrimental for a serotype 1 colonized person. Furthermore, PspC is a well distributed but highly variable gene and protein (7). Could the authors please show how conserved/variable the unitig 8805 (among their 909 sequenced serotype 1 strains) is?. This is especially important in order to conclude or suggest the usefulness of this protein region for the design of protein-based vaccines to prevent pneumococcal CNS infections (8). Finally, the authors describe clearly laboratory methods, platforms, pipelines and statistical analysis that will be useful to the scientific community in terms of comparability and reproducibility.
Minor Comments: The hypothesis tested in this study was that allelic variation in selected genes explain the neurotropism of some clones of Streptococcus pneumoniae. This was tested by detailed comparative genomic analysis of 909 invasive isolated of the hyper-virulent pneumococcal serotype 1, all from the African continent. The strains were collected from patients with infections in the central nervous system (CNS) i.e. cerebrospinal fluid (CSF), (N =297) and in non-CNS tissues (blood, lungs, joints, peritoneum) (N=612) from hospitalized patients of any age through hospital-based surveillance in West Africa. To avoid bias isolates from known outbreaks were not included. Extracted DNA from the strains was sequenced at the Wellcome Sanger Institute, UK, using Genome Analyser II and HiSeq 4000 Sequencing Systems. Appropriate quality control criteria were implemented. Based on sequence analyses each strain was assigned to serotype, MLST sequence type (ST), and international pneumococcal sequence cluster (GPSC) and the entire collection was subjected to phylogenetic analysis. State-of-the-art genetic and statistical analyses identified two genome-wide significant unitigs whose presence/absence were associated with CNS isolates: the genes encoding pneumococcal surface protein C (pspC)/ choline-binding protein A (cpbA), and the putative DnaQ associated with the putative DnaQ family exonuclease or DinG family helicase, showing odds ratios 0.70 and 0.71, respectively. In addition, above the threshold (odds ratio: 1.10) was the subunit S protein of a Type 1 restriction-modification system, previously shown to be an important regulator of capsule polysaccharide production, among other properties. Genome sequences are deposited in the European Nucleotide Archive (ENA). The hypothesis is important, and the study design is comprehensive and carefully performed. The results are interesting and of potential practical importance in attempts to prevent CNS infections. The manuscript is well written and the figures, including the supplementary figures, are of excellent standard.
Response: Thank you for the excellent summary of the findings and their relevance as well as limitations of the paper.
A few points should be considered by the authors.

Response:
We have now mentioned the suggested mechanism of action for the Type 1 restriction modification system in the revised manuscript in line 241.

Reviewer #2 (Remarks to the Author):
This is an important, large scale study of a single serotype in the setting of endemic meningitis. The focus of the study on one serotype in a specific, large geographic region of study is novel and allows comparisons to be made to studies done in other geographical regions. The study aims to determine genetic variants associated with neurotropism that could be used for therapeutic or prophylactic interventions. The study succeeds in it's aims to determine variants that may be associated with CNS infection but concedes that the biological picture is more complicated than a simple variant, as the effect size of the variants discovered is small and they are do not show heritability. The results are still important, however, as the variants discovered are significant and can be the subject of further work.
I am by no-means an expert in the statistical and bioinformatic methods used, however it appears to me that they are appropriate and have been carefully controlled to reduce confounding influences. A significant amount of detail and supporting data is included which should allow other researchers to replicate the work. Limitations of the study are clearly discussed, for example the possibility of non-CNS isolates simply being "in transit" to the CNS.
Response: Thank you for the excellent summary of the findings and their relevance as well as limitations of the paper. I only have a few minor points to raise.
1. Figure 1a, while a nice way of presenting the study design, I'm unsure whether it shows anything that is not already described in the text. To me it is unclear whether the sizes or positions of the squares are meant to represent anything?
Response: This is an excellent assessment of the figure. The squares in Fig. 1a are of the same size but placed at different positions as each line represent different data types used in the GWAS analysis namely SNPs, unitgs and gene presence/absence. We have now labelled the squares to show the data types that they represent for clarity. Figure 3A. the paragraph is discussing significant unitig associated with CNS isolates, but the figure referenced is the number of snps vs clade, the figure reference is meant to be 4A.

Line 209 references
Response: Indeed, the reference was meant for Fig. 4a instead of Fig. 3a. We have now amended the text to reference Fig. 4a. 3. Figure 4A. Perhaps does not need the x-axis scale as at first I thought, because it was labelled unitig (ID), that these were the ID numbers. But actually, they are just a count of the unordered unitigs and the (ID) was referring to the annotated ID numbers.

Response:
We concur with the reviewer's assessment. We have now removed the x-axis scale, which in this case was unnecessary, to avoid misunderstanding as pointed out by the reviewer. 4. Table 1 header -should it be unitigs rather than SNPS?
Response: Yes, Table 1's header should be unitigs rather than SNPs. We have now corrected text.

Response:
We have now added a reference to Table 2 at the suggested location corresponding to line 372 in the revised manuscript.
6. Fig S7 -The key is not shown; however, I assume the key is the same as figure S6(b) Response: Thank you for highlighting this. We have now added the key to Supplementary  Fig. 8 (previously Fig. S7).

Reviewer #3 (Remarks to the Author):
General Comments: After a careful and critical reading of this research study, it was easy to understand that the authors (Chaguza, et al.) are reporting for the first time important genetic associations modulating the tropism of the hyper-virulent Streptococcus pneumoniae serotype 1 into cerebrospinal fluid to cause central nervous system (CNS) infections, particularly meningitis. The authors compared 909 pneumococcal serotype 1 strains isolated from CNS (297) and non-CNS (612) human specimens, aiming to determine whether the presence of genetic variation(s) may contribute to our understanding of the ability of the pneumococcus to translocate across the blood-brain-barrier to cause meningitis (neurotropism). The manuscript provides solid GWAS data and analysis of these pneumococcal serotype 1 isolates collected in sub-Saharan countries during 20 years . However, the impact of pneumococcal conjugate vaccine (PCV) introduction (selective pressure) is not considered in the analysis. Serotype 1 is included in the PCV10 and PCV13 but was not included in the precedent PCV7 formulation. Is there any metadata about the vaccination status (post-PCV period) of the participants from were pneumococci were recovered? Moreover, the most relevant result links a genetic variant (Unitig 8805) located in the PspC (CbpA, SpsA) coding region of the pneumococcal serotype 19A (Hungary 19A-14) genome with neurotropism, which is particularly in good agreement with the previous work of the co-author Dr. Aras Kadioglu (1) and other scientists in the field worldwide (2)(3)(4)(5)(6). A (supplementary) figure depicting the capsule 1 operon and the most relevant unitigs on a serotype 1 representative genome may be helpful. To highlight the pspC gene, PspC protein, Proline-rich domain and the unitig 8805 would contribute to a better understanding of the discussion at a molecular level. Especially on how human antibodies would have protective effects and how its presence/absence can be beneficial/detrimental for a serotype 1 colonized person. Furthermore, PspC is a well distributed but highly variable gene and protein (7). Could the authors please show how conserved/variable the unitig 8805 (among their 909 sequenced serotype 1 strains) is?. This is especially important in order to conclude or suggest the usefulness of this protein region for the design of protein-based vaccines to prevent pneumococcal CNS infections (8). Finally, the authors describe clearly laboratory methods, platforms, pipelines and statistical analysis that will be useful to the scientific community in terms of comparability and reproducibility.
Response: Thank you for your excellent summary of the paper and comments. The majority of the isolates were collected prior to the introduction of 10-or 13-valent pneumococcal conjugate vaccines (PCV) in Sub Saharan Africa. Since sampling of the isolates was across all age groups, the majority of the isolates used in the analysis were from unvaccinated individuals. It is possible that some of the isolates may have come from vaccinated infants, but this would have been a small subset of the dataset for the reasons mentioned although the individual vaccination status of infants sampled after vaccination is not available. As such, we are of the view that the pneumococcal vaccination is highly unlikely to have biased our GWAS findings considering that the PCVs targets the polysaccharide capsule, which was identical in all the isolates analysed, rather than certain protein variants outside the capsule such as those identified by our analysis. We have now highlighted this information in lines 376-381 and provided year of isolation for the isolates in Supplementary Data 1. As suggested, we have now included data showing the conservation of the genomic region containing the unitig 8805 in line 313-316 and Supplementary Data 3 while the distribution of the unitig 8805 across the phylogenetic tree is shown in Fig. 5b. Furthermore, we have a diagram showing locations of the capsule 1 operon/locus and the genome-wide significant unitigs has been included in Fig. 5a as suggested.
Minor Comments: