Streptococcus pyogenes causes 700 million human infections annually worldwide, yet, despite a century of intensive effort, there is no licensed vaccine against this bacterium. Although a number of large-scale genomic studies of bacterial pathogens have been published, the relationships among the genome, transcriptome, and virulence in large bacterial populations remain poorly understood. We sequenced the genomes of 2,101 emm28 S. pyogenes invasive strains, from which we selected 492 phylogenetically diverse strains for transcriptome analysis and 50 strains for virulence assessment. Data integration provided a novel understanding of the virulence mechanisms of this model organism. Genome-wide association study, expression quantitative trait loci analysis, machine learning, and isogenic mutant strains identified and confirmed a one-nucleotide indel in an intergenic region that significantly alters global transcript profiles and ultimately virulence. The integrative strategy that we used is generally applicable to any microbe and may lead to new therapeutics for many human pathogens.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Whole-genome sequencing data for the 2,101 isolates studied have been deposited in the NCBI Sequence Read Archive under BioProject accession number PRJNA434389. The slightly updated complete genome sequence of the emm28 reference strain MGAS6180 (GenBank accession number CP000056) has been deposited in the NCBI GenBank database under the same accession number. Transcriptome data have been deposited in the Gene Expression Omnibus under accession GSE113058. The data that support the findings of this study are available from the corresponding author upon request.
Beres, S. B. et al. Transcriptome remodeling contributes to epidemic disease caused by the human pathogen Streptococcus pyogenes. mBio 7, e00403-16 (2016).
Chewapreecha, C. et al. Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes. PLoS Genet. 10, e1004547 (2014).
Fernandez-Romero, N. et al. Uncoupling between core genome and virulome in extraintestinal pathogenic Escherichia coli. Can. J. Microbiol. 61, 647–652 (2015).
Long, S. W. et al. Population genomic analysis of 1,777 extended-spectrum beta-lactamase-producing Klebsiella pneumoniae isolates, Houston, Texas: unexpected abundance of clonal group 307. mBio 8, e00489-17 (2017).
Mukherjee, S. et al. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life. Nat. Biotechnol. 35, 676–683 (2017).
Nasser, W. et al. Evolutionary pathway to increased virulence and epidemic group A Streptococcus disease derived from 3,615 genome sequences. Proc. Natl Acad. Sci. USA 111, E1768–E1776 (2014).
Bruchmann, S. et al. Deep transcriptome profiling of clinical Klebsiella pneumoniae isolates reveals strain and sequence type-specific adaptation. Environ. Microbiol. 17, 4690–4710 (2015).
Dotsch, A. et al. The Pseudomonas aeruginosa transcriptional landscape is shaped by environmental heterogeneity and genetic variation. mBio 6, e00749 (2015).
Sharma-Kuinkel, B. K. et al. Potential influence of Staphylococcus aureus clonal complex 30 genotype and transcriptome on hematogenous infections. Open Forum Infect. Dis. 2, ofv093 (2015).
Felek, S., Tsang, T. M. & Krukonis, E. S. Three Yersinia pestis adhesins facilitate Yop delivery to eukaryotic cells and contribute to plague virulence. Infect. Immun. 78, 4134–4150 (2010).
Swearingen, M. C., Porwollik, S., Desai, P. T., McClelland, M. & Ahmer, B. M. Virulence of 32 Salmonella strains in mice. PLoS One 7, e36043 (2012).
Schreiber, H. L. T. et al. Bacterial virulence phenotypes of Escherichia coli and host susceptibility determine risk for urinary tract infections.Sci. Transl. Med. 9, eaaf1283 (2017).
Carapetis, J. R., Steer, A. C., Mulholland, E. K. & Weber, M. The global burden of group A streptococcal diseases. Lancet Infect. Dis. 5, 685–694 (2005).
Carapetis, J. R. et al. Acute rheumatic fever and rheumatic heart disease. Nat. Rev. Dis. Primers 2, 15084 (2016).
Zhu, L. et al. A molecular trigger for intercontinental epidemics of group A Streptococcus. J. Clin. Invest. 125, 3545–3559 (2015).
Zhu, L., Olsen, R. J., Nasser, W., de la Riva Morales, I. & Musser, J. M. Trading capsule for increased cytotoxin production: contribution to virulence of a newly emerged clade of emm89 Streptococcus pyogenes. mBio 6, e01378-15 (2015).
Colman, G., Tanna, A., Efstratiou, A. & Gaworzewska, E. T. The serotypes of Streptococcus pyogenes present in Britain during 1980–1990 and their association with disease. J. Med. Microbiol. 39, 165–178 (1993).
Gherardi, G., Vitali, L. A. & Creti, R. Prevalent emm types among invasive GAS in Europe and North America since year 2000. Front. Public Health 6, 59 (2018).
Smit, P. W. et al. Epidemiology and emm types of invasive group A streptococcal infections in Finland, 2008–2013. Eur. J. Clin. Microbiol. Infect. Dis. 34, 2131–2136 (2015).
Ikebe, T. et al. Increased prevalence of group A Streptococcus isolates in streptococcal toxic shock syndrome cases in Japan from 2010 to 2012. Epidemiol. Infect. 143, 864–872 (2015).
Naseer, U., Steinbakk, M., Blystad, H. & Caugant, D. A. Epidemiology of invasive group A streptococcal infections in Norway 2010–2014: a retrospective cohort study.Eur. J. Clin. Microbiol. Infect. Dis. 35, 1639–1648 (2016).
Nelson, G. E. et al. Epidemiology of invasive group A streptococcal infections in the United States, 2005–2012. Clin. Infect. Dis. 63, 478–486 (2016).
Plainvert, C. et al. Invasive group A streptococcal infections in adults, France (2006–2010). Clin. Microbiol. Infect. 18, 702–710 (2012).
Al-Shahib, A. et al. Emergence of a novel lineage containing a prophage in emm/M3 group A Streptococcus associated with upsurge in invasive disease in the UK. Microb. Genom. 2, e000059 (2016).
Davies, M. R. et al. Emergence of scarlet fever Streptococcus pyogenes emm12 clones in Hong Kong is associated with toxin acquisition and multidrug resistance. Nat. Genet. 47, 84–87 (2015).
Fittipaldi, N. et al. Full-genome dissection of an epidemic of severe invasive disease caused by a hypervirulent, recently emerged clone of group A Streptococcus. Am. J. Pathol. 180, 1522–1534 (2012).
Hamilton, S. M., Stevens, D. L. & Bryant, A. E. Pregnancy-related group a streptococcal infections: temporal relationships between bacterial acquisition, infection onset, clinical findings, and outcome. Clin. Infect. Dis. 57, 870–876 (2013).
Johnson, D. R., Stevens, D. L. & Kaplan, E. L. Epidemiologic analysis of group A streptococcal serotypes associated with severe systemic infections, rheumatic fever, or uncomplicated pharyngitis. J. Infect. Dis. 166, 374–382 (1992).
Shea, P. R. et al. Group A Streptococcus emm gene types in pharyngeal isolates, Ontario, Canada, 2002–2010. Emerg. Infect. Dis. 17, 2010–2017 (2011).
Smoot, J. C. et al. Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks. Proc. Natl Acad. Sci. USA 99, 4668–4673 (2002).
Ben Zakour, N. L., Venturini, C., Beatson, S. A. & Walker, M. J. Analysis of a Streptococcus pyogenes puerperal sepsis cluster by use of whole-genome sequencing. J. Clin. Microbiol. 50, 2224–2228 (2012).
Chuang, I., Van Beneden, C., Beall, B. & Schuchat, A. Population-based surveillance for postpartum invasive group A Streptococcus infections, 1995–2000. Clin. Infect. Dis. 35, 665–670 (2002).
Gaworzewska, E. & Colman, G. Changes in the pattern of infection caused by Streptococcus pyogenes. Epidemiol. Infect. 100, 257–269 (1988).
Raymond, J., Schlegel, L., Garnier, F. & Bouvet, A. Molecular characterization of Streptococcus pyogenes isolates to investigate an outbreak of puerperal sepsis. Infect. Control Hosp. Epidemiol. 26, 455–461 (2005).
Sims, D., Sudbery, I., Ilott, N. E., Heger, A. & Ponting, C. P. Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet. 15, 121–132 (2014).
Bricker, A. L., Carey, V. J. & Wessels, M. R. Role of NADase in virulence in experimental invasive group A streptococcal infection. Infect. Immun. 73, 6562–6566 (2005).
Bricker, A. L., Cywes, C., Ashbaugh, C. D. & Wessels, M. R. NAD+-glycohydrolase acts as an intracellular toxin to enhance the extracellular survival of group A streptococci. Mol. Microbiol. 44, 257–269 (2002).
Sumby, P. et al. Evolutionary origin and emergence of a highly successful clone of serotype M1 group A Streptococcus involved multiple horizontal gene transfer events. J. Infect. Dis. 192, 771–782 (2005).
Zhu, L. et al. Contribution of secreted NADase and streptolysin O to the pathogenesis of epidemic serotype M1 Streptococcus pyogenes infections. Am. J. Pathol. 187, 605–613 (2017).
Meehl, M. A., Pinkner, J. S., Anderson, P. J., Hultgren, S. J. & Caparon, M. G. A novel endogenous inhibitor of the secreted streptococcal NAD-glycohydrolase. PLoS Pathog. 1, e35 (2005).
Tatsuno, I. et al. Characterization of the NAD-glycohydrolase in streptococcal strains. Microbiology 153, 4253–4260 (2007).
Shimomura, Y. et al. Complete genome sequencing and analysis of a Lancefield group G Streptococcus dysgalactiae subsp. equisimilis strain causing streptococcal toxic shock syndrome (STSS). BMC Genomics 12, 17 (2011).
Carroll, R. K. et al. Naturally occurring single amino acid replacements in a regulatory protein alter streptococcal gene expression and virulence in mice. J. Clin. Invest. 121, 1956–1968 (2011).
Graham, M. R. et al. Virulence control in group A Streptococcus by a two-component gene regulatory system: global expression profiling and in vivo infection modeling. Proc. Natl Acad. Sci. USA 99, 13855–13860 (2002).
Ribardo, D. A. & McIver, K. S. Defining the Mga regulon: comparative transcriptome analysis reveals both direct and indirect regulation by Mga in the group A Streptococcus. Mol. Microbiol. 62, 491–508 (2006).
Ramalinga, A., Danger, J. L., Makthal, N., Kumaraswami, M. & Sumby, P. Multimerization of the virulence-enhancing group A Streptococcus transcription factor RivR is required for regulatory activity.J. Bacteriol. 199, e00452-16 (2017).
Trevino, J., Liu, Z., Cao, T. N., Ramirez-Pena, E. & Sumby, P. RivR is a negative regulator of virulence factor expression in group A Streptococcus. Infect. Immun. 81, 364–372 (2013).
Nyberg, P., Rasmussen, M. & Bjorck, L. α2-Macroglobulin-proteinase complexes protect Streptococcus pyogenes from killing by the antimicrobial peptide LL-37. J. Biol. Chem. 279, 52820–52823 (2004).
Rasmussen, M., Muller, H. P. & Bjorck, L. Protein GRAB of Streptococcus pyogenes regulates proteolysis at the bacterial surface by binding α2-macroglobulin. J. Biol. Chem. 274, 15336–15344 (1999).
Toppel, A. W., Rasmussen, M., Rohde, M., Medina, E. & Chhatwal, G. S. Contribution of protein G-related α2-macroglobulin-binding protein to bacterial virulence in a mouse skin model of group A streptococcal infection. J. Infect. Dis. 187, 1694–1703 (2003).
Haas, B. J., Chin, M., Nusbaum, C., Birren, B. W. & Livny, J. How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? BMC Genomics 13, 734 (2012).
Shishkin, A. A. et al. Simultaneous generation of many RNA-Seq libraries in a single reaction. Nat. Methods 12, 323–325 (2015).
Engleberg, N. C., Heath, A., Miller, A., Rivera, C. & DiRita, V. J. Spontaneous mutations in the CsrRS two-component regulatory system of Streptococcus pyogenes result in enhanced virulence in a murine model of skin and soft tissue infection. J. Infect. Dis. 183, 1043–1054 (2001).
Li, J. et al. Neutrophils select hypervirulent CovRS mutants of M1T1 group A Streptococcus during subcutaneous infection of mice. Infect. Immun. 82, 1579–1590 (2014).
Mayfield, J. A. et al. Mutations in the control of virulence sensor gene from Streptococcus pyogenes after infection in mice lead to clonal bacterial variants with altered gene regulatory activity and virulence. PLoS One 9, e100698 (2014).
Sumby, P., Whitney, A. R., Graviss, E. A., DeLeo, F. R. & Musser, J. M. Genome-wide analysis of group A streptococci reveals a mutation that modulates global phenotype and disease specificity. PLoS Pathog. 2, e5 (2006).
Tatsuno, I., Okada, R., Zhang, Y., Isaka, M. & Hasegawa, T. Partial loss of CovS function in Streptococcus pyogenes causes severe invasive disease. BMC Res. Notes 6, 126 (2013).
Trevino, J. et al. CovS simultaneously activates and inhibits the CovR-mediated repression of distinct subsets of group A Streptococcus virulence factor-encoding genes. Infect. Immun. 77, 3141–3149 (2009).
Breiman, L. Random Forests. Mach. Learn. 45, 5–32 (2001).
Stalhammar-Carlemalm, M., Areschoug, T., Larsson, C. & Lindahl, G. The R28 protein of Streptococcus pyogenes is related to several group B streptococcal surface proteins, confers protective immunity and promotes binding to human epithelial cells. Mol. Microbiol. 33, 208–219 (1999).
Stalhammar-Carlemalm, M., Stenberg, L. & Lindahl, G. Protein rib: a novel group B streptococcal cell surface protein that confers protective immunity and is expressed by most strains causing invasive infections. J. Exp. Med. 177, 1593–1603 (1993).
Beres, S. B. & Musser, J. M. Contribution of exogenous genetic elements to the group A Streptococcus metagenome. PLoS One 2, e800 (2007).
Green, N. M. et al. Genome sequence of a serotype M28 strain of group A Streptococcus: potential new insights into puerperal sepsis and bacterial disease specificity. J. Infect. Dis. 192, 760–770 (2005).
Coll, F. et al. Genome-wide analysis of multi- and extensively drug-resistant Mycobacterium tuberculosis. Nat. Genet. 50, 307–316 (2018).
Earle, S. G. et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat. Microbiol. 1, 16041 (2016).
Gibson, G., Powell, J. E. & Marigorta, U. M. Expression quantitative trait locus analysis for translational medicine. Genome Med. 7, 60 (2015).
Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
Olsen, R. J. & Musser, J. M. Molecular pathogenesis of necrotizing fasciitis. Annu. Rev. Pathol. 5, 1–31 (2010).
Rodriguez-Ortega, M. J. et al. Characterization and identification of vaccine candidate proteins through analysis of the group A Streptococcus surface proteome. Nat. Biotechnol. 24, 191–197 (2006).
Zhu, L. et al. Intergenic variable-number tandem-repeat polymorphism upstream of rocA alters toxin production and enhances virulence in Streptococcus pyogenes. Infect. Immun. 84, 2086–2093 (2016).
Hammarlof, D. L. et al. Role of a single noncoding nucleotide in the evolution of an epidemic African clade of Salmonella.Proc. Natl Acad. Sci. USA 115, E2614–E2623 (2018).
Blount, Z. D., Barrick, J. E., Davidson, C. J. & Lenski, R. E. Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature 489, 513–518 (2012).
Zaunbrecher, M. A., Sikes, R. D. Jr, Metchock, B., Shinnick, T. M. & Posey, J. E. Overexpression of the chromosomally encoded aminoglycoside acetyltransferase eis confers kanamycin resistance in Mycobacterium tuberculosis. Proc. Natl Acad. Sci. USA 106, 20004–20009 (2009).
Puopolo, K. M. & Madoff, L. C. Upstream short sequence repeats regulate expression of the alpha C protein of group B Streptococcus. Mol. Microbiol. 50, 977–991 (2003).
Stalhammar-Carlemalm, M., Areschoug, T., Larsson, C. & Lindahl, G. Cross-protection between group A and group B streptococci due to cross-reacting surface proteins. J. Infect. Dis. 182, 142–149 (2000).
Weckel, A. et al. The N-terminal domain of the R28 protein promotes emm28 group A Streptococcus adhesion to host cells via direct binding to three integrins. J. Biol. Chem. 293, 16006–16018 (2018).
Valdes, K. M. et al. The fruRBA operon is necessary for group A streptococcal growth in fructose and for resistance to neutrophil killing during growth in whole human blood. Infect. Immun. 84, 1016–1031 (2016).
Jeukens, J. et al. Genomics of antibiotic-resistance prediction in Pseudomonas aeruginosa.Ann. NY Acad. Sci. 1435, 5–17 (2017).
Nguyen, M. et al. Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae. Sci. Rep. 8, 421 (2018).
Pesesky, M. W. et al. Evaluation of machine learning and rules-based approaches for predicting antimicrobial resistance profiles in Gram-negative bacilli from whole genome sequence data. Front. Microbiol. 7, 1887 (2016).
Rishishwar, L., Petit, R. A. 3rd, Kraft, C. S. & Jordan, I. K. Genome sequence-based discriminator for vancomycin-intermediate Staphylococcus aureus. J. Bacteriol. 196, 940–948 (2014).
Li, Y. et al. Validation of beta-lactam minimum inhibitory concentration predictions for pneumococcal isolates with newly encountered penicillin binding protein (PBP) sequences. BMC Genomics 18, 621 (2017).
Li, Y. et al. Penicillin-binding protein transpeptidase signatures for tracking and predicting beta-lactam resistance levels in Streptococcus pneumoniae. mBio 7, e00756-16 (2016).
Hao, K. et al. Lung eQTLs to help reveal the molecular underpinnings of asthma. PLoS Genet. 8, e1003029 (2012).
Naranbhai, V. et al. Genomic modulators of gene expression in human neutrophils. Nat. Commun. 6, 7545 (2015).
Ongen, H. et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 49, 1676–1683 (2017).
Tung, J., Zhou, X., Alberts, S. C., Stephens, M. & Gilad, Y. The genetic architecture of gene expression levels in wild baboons. eLife https://doi.org/10.7554/eLife.04729.001 (2015).
Albert, F. W., Treusch, S., Shockley, A. H., Bloom, J. S. & Kruglyak, L. Genetics of single-cell protein abundance variation in large yeast populations. Nature 506, 494–497 (2014).
Parker, C. C. et al. Genome-wide association study of behavioral, physiological and gene expression traits in outbred CFW mice. Nat. Genet. 48, 919–926 (2016).
Francesconi, M. & Lehner, B. The effects of genetic variation on gene expression dynamics during development. Nature 505, 208–211 (2014).
Beres, S. B. et al. Molecular complexity of successive bacterial epidemics deconvoluted by comparative pathogenomics. Proc. Natl Acad. Sci. USA 107, 4371–4376 (2010).
Olsen, R. J. et al. The majority of 9,729 group A Streptococcus strains causing disease secrete SpeB cysteine protease: pathogenesis implications. Infect. Immun. 83, 4750–4758 (2015).
Beres, S. B. et al. Genome sequence analysis of emm89 Streptococcus pyogenes strains causing infections in Scotland, 2010–2016. J. Med. Microbiol. 66, 1765–1773 (2017).
Liu, Y., Schroder, J. & Schmidt, B. Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics 29, 308–315 (2013).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Inouye, M. et al. SRST2: rapid genomic surveillance for public health and hospital microbiology labs. Genome Med. 6, 90 (2014).
Croucher, N. J. et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 43, e15 (2015).
Cheng, L., Connor, T. R., Siren, J., Aanensen, D. M. & Corander, J. Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol. Biol. Evol. 30, 1224–1228 (2013).
Huson, D. H. SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68–73 (1998).
Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput. Biol. 13, e1005595 (2017).
Long, S. W., Kachroo, P., Musser, J. M. & Olsen, R. J. Whole-genome sequencing of a human clinical isolate of emm28 Streptococcus pyogenes causing necrotizing fasciitis acquired contemporaneously with Hurricane Harvey.Genome Announc. 5, e01269-17 (2017).
Lees, J. A. et al. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat. Commun. 7, 12797 (2016).
Bishop, C. Pattern Recognition and Machine Learning (Springer, New York, 2006).
Eraso, J. M. et al. Genomic landscape of intrahost variation in group A Streptococcus: repeated and abundant mutational inactivation of the fabT gene encoding a regulator of fatty acid synthesis. Infect. Immun. 84, 3268–3281 (2016).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Magoc, T., Wood, D. & Salzberg, S. L. EDGE-pro: estimated degree of gene expression in prokaryotic genomes. Evol. Bioinform. Online 9, 127–136 (2013).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome. Biol. 15, 550 (2014).
Hoffman, G. E. & Schadt, E. E. variancePartition: interpreting drivers of variation in complex gene expression studies. BMC Bioinformatics 17, 483 (2016).
Tarazona, S. et al. Data quality aware analysis of differential expression in RNA-Seq with NOISeq R/Bioc package. Nucleic Acids Res. 43, e140 (2015).
Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome. Biol. 17, 132 (2016).
Shabalin, A. A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012).
Committee for the Update of the Guide for the Care and Use of Laboratory Animals, Institute for Laboratory Animal Research & Division on Earth and Life Studies Guide for the Care and Use of Laboratory Animals 8th edn. (National Academies Press, Washington, DC, 2011).
This study was supported in part by the Fondren Foundation, Houston Methodist Hospital and Research Institute (to J.M.M.), the Academy of Finland (grant 255636 to J.V.), a European Research Council grant (number 742158 to J.C.), and a National Institutes of Health grant (1R01AI109096-01A1 to M.K.). This research was also supported in part by the Intramural Research Program of the National Institute of Allergy and Infectious Disease, National Institutes of Health (to F.R.D.). We thank N. Copeland, N. Jenkins, and D. Ginsburg for critical comments and suggestions to improve the manuscript; K. Stockbauer for critical comments and editorial assistance; E. Graviss, H. Erlendsdottir, W. Hong, and S. Linson for technical assistance; H.-L. Hyyryläinen, J. Jalava, and the Finnish clinical microbiology laboratories; A. A. Shishkin for helpful suggestions regarding the RNAtag-seq protocol; M. Todorovic and J. Jonsdottir Nielsen for banking strains from the Faroe Islands; A. McGeer for Ontario strains; C. Van Beneden, B. Beall, and the Active Bacterial Core Surveillance of the CDC’s Emerging Infections Programs network; A. Ramstad Alme and A. Witsø for technical assistance; and M. Steinbakk (Norwegian Laboratory for Streptococci) for support.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
All strains were isolated during a 26-year period, spanning 1991 through 2016. (a) Distribution of strains by country. Vertical black bars indicate the number of isolates per year. The total number of strains isolated in the USA was 952, of which 951 strains were collected as part of the Active Bacterial Core (ABC) surveillance study conducted by the Centers for Disease Control and Prevention32,68,111,112 (see https://www.cdc.gov/abcs/index.html for a complete description of the study). The one additional strain (from Texas) is strain MGAS6180, which is the genome sequence reference strain. Canadian strains are all from Ontario. The Faroe Islands are a self-governing part of Denmark. Regardless of country, all strains were recovered as part of comprehensive, population-based studies. (b) Distribution of emm28 isolates by state in the USA. All strains were isolated during a period of 18 years, spanning 1995 through 2012. Vertical black bars indicate the number of isolates per year. For the U.S. isolates, the states have been coded (A-J) at the request of the Centers for Disease Control.
(a) Next-generation sequencing data analysis pipeline employed for the preprocessing, read mapping, variant discovery and downstream genomic analyses of whole-genome sequencing data. aMLST: Multilocus sequence type, bSNP: Single nucleotide polymorphism, cHGT: Horizontal gene transfer. (b) Bioinformatics pipeline for demultiplexing, quality assessment, adapter trimming, read mapping, and data normalization and differential expression of transcriptome data.
Strains are represented by country and year of isolation. Only strains belonging to subclades 1A (SC1A-red), 1B (SC1B-blue), 2A (SC2A-green), and 2B (SC2B-brown) are shown. (a) Vertical bars indicate the number of isolates per year. The number (n) of strains isolated in each country is shown. Six distant outlier strains in the phylogenetic tree and 7 strains from the Faroe Islands are not shown. Thus, the number of strains does not sum to the total sample of 2,101 strains. No strains belonging to subclade SC2A or SC2B were isolated in Iceland. (b) Total number of strains belonging to each individual subclade per country. US, United States; CA, Canada (Ontario); FI, Finland; NO, Norway; IS, Iceland. Others refers to 6 distant outlier strains in the phylogenetic tree.
Comparison of biological replicates per strain at mid-exponential (a) and early-stationary phase (b). Mean correlation coefficient (Pearson) and standard deviation of normalized and log-transformed transcript counts for three biological replicates per strain are plotted.
(a) Schematic depicts number of differentially expressed (DE) genes obtained by comparing transcriptome data for strains in the three major genetic subclades at mid-exponential (ME) and early-stationary (ES) phases. (b) Fold-increase in nga-ifs-slo transcript levels in SC2A (n = 15) strains compared to SC1A (n = 12) and SC1B (n = 23) strains at ME and ES phase. (c) grab gene transcript levels (normalized counts) were significantly increased in SC1B (n = 23) strains compared to SC1A (n = 12) strains at both growth phases (ME and ES). A significant increase in grab transcript levels in SC2A (n = 15) strains compared to SC1A (n = 12) strains was observed at ES phase. Statistical tests were performed using Mann-Whitney (two-tailed) test. Data are presented as box and whisker plots, where whiskers represent the minimum and maximum values. n represents the number of strains; each strain has three independent biological replicates.
Supplementary Figure 6 Comparison of three replicates versus single replicate and RNA-seq versus RNAtaq-seq.
(a) Scatterplots comparing WT-like strains from each of three major subclades (10 SC1A, 22 SC1B, 14 SC2A) using triplicates versus one randomly selected replicate from the 50-strain data. Presence of three biological replicates in the 50-strain data allowed us to simulate comparisons of averaged normalized counts when three versus one replicate were used. Strong correlation (r = 0.99) was observed for each triplicate- versus single-replicate comparison. Pearson correlation coefficient (r) is shown for each comparison. n represents number of samples (number of strains multiplied by number of replicates). (b) Seven strains were processed using the two protocols, that is, RNA-seq (three biological replicates per strain) and RNAtag-seq (singletons, that is, using single replicates). Principal component analysis of the seven strains processed using RNA-seq (three spheres colored cyan in the PCA plot) and RNAtag-seq (single sphere colored red in the PCA plot) displays overlapping spatial clustering. Expression profile of the 7 strains in the PCA plot is circled and numbered 1 through 7. Strains analyzed: 1-MGAS7888, 2-MGAS29284, 3-MGAS29553, 4-MGAS28746, 5-MGAS7914, 6-MGAS28647, and 7-MGAS28686. (c) Scatterplots were generated for the normalized counts (log-transformed) from the aforementioned seven strains processed using the two protocols, that is, RNA-seq and RNAtag-seq. For each strain, normalized transcript counts were averaged over the three biological replicates (RNA-seq protocol) and compared to RNAtag-seq normalized counts (singleton strain samples). Pearson correlation coefficient (r) is shown for each comparison.
Supplementary Figure 7 Strategy used to make pools and superpools and their sequencing read content (millions).
(a) Strategy used to make pools and superpools. Strains (small yellow circles) were grouped to form 58 distinct pools (gray circles) by labeling total RNA extracted from each strain with unique barcoded oligoribonucleotides. RNA from 8 strains was mixed to create one pool, with the exception of pool 58, which contained RNA from only 5 strains. In total there were 58 pools. cDNAs from each pool were individually barcoded with Illumina P7 index oligonucleotides. Four different P7 oligonucleotides were used in this study. Four pools were mixed to form one superpool (large yellow circles). In total there were 15 superpools. Pool 58 contained cDNA from only five strains, and superpool 15 contained only two pools. The original number of strains we performed RNAtag-seq analysis on was 461, and here we present data for 442. Data from 19 strains were not included because of low sequence coverage. (b) Average number of sequence reads per pool for each of the 15 superpools is presented. Each circle represents mean and error bars represent standard deviation (SD). Median was calculated using data for superpools 1–14 (each comprised of four pools). Superpool 15 contained only two pools. (c) Graph depicts the median number of reads per sample per pool in millions. Median reads per sample for the pools 57 and 58 are larger due to the higher sequencing depth of these pools.
(a) The two major clusters identified by DBSCAN are shown. (b) No subclade-specific clustering was evident within the two clusters. (c) Twenty strains with ropB mutations are outliers (colored yellow) and group away from the other strains with ropB mutations (colored orange). ropB-non-outlier strains cluster with WT-like strains (colored light blue) and strains with mutations in other major regulator genes (colored blue). (d) Cluster A ropB mutant strains separated into two groups validated by k-means clustering and were designated arbitrarily as Group I and Group II. (e) Group II ropB mutant strains had significantly decreased speB transcript levels compared to Group I strains (Mann-Whitney, two-tailed, P < 0.0001). (f) Mutations were mapped onto the crystal structure of the C-terminal region of the RopB protein. Variant amino acid positions associated with Group I or Group II organisms are labeled in red and pink, respectively. Amino acid residues present in inferred functional domains are demarcated with ovals. Mutations located in RopB functional domains were present at significantly increased frequency (test of proportions-one-tailed, P < 0.05) in Group II strains (pink labels within ovals) compared to Group I strains (red labels within ovals). PBD: peptide binding domain, NTD: N-terminal domain. The crystal structure of the NTD has not been solved. (g) Kaplan-Meier curve showing that the Group I (n = 3) and Group II (n = 4) strains differ significantly (log-rank test) in virulence in a mouse necrotizing myositis infection model (40 mice per strain). (h) Gross pathology images of infected mouse hindlimbs (n = 5 mice per strain) reflect the difference in virulence between the Group I (top) and Group II (bottom) strains, and representative images are displayed. Boxed areas demarcated in white illustrate major lesion areas.
Supplementary Figure 9 Lack of significant relationship between extent of transcriptome remodeling (number of DE genes) and genetic distance.
(a) Scatterplot comparing the number of differentially expressed genes (DE) and the genetic distance of the 442 singleton strains. For each of the strains, genes were called differentially expressed compared to reference strain MGAS28737. Genetic distance was measured as the number of core chromosomal SNPs compared to strain MGAS28737. Red line represents the line of regression. No significant correlation was observed between genetic distance and extent of transcriptome remodeling (number of DE genes) with R2 value of 0.0046. (b) No improvement in correlation (R2 = 0.0040) was observed when the analysis was conducted using only data for the 188 strains that have wild-type alleles for all known major regulatory genes. Red, SC1A; blue, SC1B; green, SC2A; yellow, SC2B. R2 value was calculated by linear regression analysis.
(a) Genome-wide association analysis was performed on 442 strains. Manhattan plot showing statistical significance (y-axis) of each k-mer (red circles) positively associated with high transcript expression of genes Spy1336/R28 and Spy1337, and their position along the 1.9 Mb GAS genome. Significant k-mers mapped to only one region of the chromosome, corresponding to the intergenic region between the Spy1336/R28 and Spy1337 genes. The top part is a schematic of the GAS genome, with vertical blue lines corresponding to open reading frames (ORFs) encoded by each strand of the chromosome. The bottom part shows an enlargement of the genome location corresponding to Spy1336/R28 and Spy1337, and the intergenic region. P values were computed by SEER software (Methods) (b) eQTL analysis identifies significant association between genotype (9T versus 10T) and expression level of genes Spy1336/R28 and Spy1337 in 50 strains at mid-exponential phase (left panel) and in 442 strains at early-stationary phase (right panel). Horizontal black bars represent mean transcript expression and standard deviation. PeQTL refers to q-values (False discovery rate, FDR) as reported by MatrixEQTL package. The threshold used for genome-wide significance was adjusted P value < 10e-8.
Supplementary Figures 1–10, Supplementary Note and Supplementary Tables 1, 7, 9, 10, 13 and 17–19
SNPs largely present in SC2A but absent in SC1A post Gubbins
Inferred MGE content for the 20 most prevalent MGE genotypes in the S. pyogenes emm28 cohort
MGE genotype based on the presence or absence of 50 phage and ICE encoded genes, 31 integrases and 19 secreted virulence factors, derived from MGEs identified in 60 complete S. pyogenes
SRST2 MGE-50 absence/presence matrix and genotype
List of differentially expressed genes comparing the three major genetic subclades at midexponential and stationary phase
Regulatory gene mutation prediction by machine learning
List of differentially expressed genes comparing transcriptomic clusters within CovR/CovS mutant strains
List of differentially expressed genes between group II versus group I ropB mutant strains
List of differentially expressed genes comparing the isogenic strains with either 9Ts or 10Ts in the intergenic region between the Spy1336/R28 and Spy1337 genes
Results of eQTL analysis
Data quality metrics for the 2101 emm28 cohort
About this article
Cite this article
Kachroo, P., Eraso, J.M., Beres, S.B. et al. Integrated analysis of population genomics, transcriptomics and virulence provides novel insights into Streptococcus pyogenes pathogenesis. Nat Genet 51, 548–559 (2019). https://doi.org/10.1038/s41588-018-0343-1
Forest and Trees: Exploring Bacterial Virulence with Genome-wide Association Studies and Machine Learning
Trends in Microbiology (2021)
New Pathogenesis Mechanisms and Translational Leads Identified by Multidimensional Analysis of Necrotizing Myositis in Primates
Genome-Wide Screens Identify Group A Streptococcus Surface Proteins Promoting Female Genital Tract Colonization and Virulence
The American Journal of Pathology (2020)
Current Opinion in Microbiology (2020)
The group A Streptococcus accessory protein RocA: regulatory activity, interacting partners and influence on disease potential
Molecular Microbiology (2020)