Integrated analysis of population genomics, transcriptomics and virulence provides novel insights into Streptococcus pyogenes pathogenesis

Kachroo, Priyanka; Eraso, Jesus M.; Beres, Stephen B.; Olsen, Randall J.; Zhu, Luchang; Nasser, Waleed; Bernard, Paul E.; Cantu, Concepcion C.; Saavedra, Matthew Ojeda; Arredondo, María José; Strope, Benjamin; Do, Hackwon; Kumaraswami, Muthiah; Vuopio, Jaana; Gröndahl-Yli-Hannuksela, Kirsi; Kristinsson, Karl G.; Gottfredsson, Magnus; Pesonen, Maiju; Pensar, Johan; Davenport, Emily R.; Clark, Andrew G.; Corander, Jukka; Caugant, Dominique A.; Gaini, Shahin; Magnussen, Marita Debess; Kubiak, Samantha L.; Nguyen, Hoang A. T.; Long, S. Wesley; Porter, Adeline R.; DeLeo, Frank R.; Musser, James M.

doi:10.1038/s41588-018-0343-1

Article
Published: 18 February 2019

Integrated analysis of population genomics, transcriptomics and virulence provides novel insights into Streptococcus pyogenes pathogenesis

Priyanka Kachroo ORCID: orcid.org/0000-0002-3563-7832¹^na1,
Jesus M. Eraso ORCID: orcid.org/0000-0003-1383-8702¹^na1,
Stephen B. Beres ORCID: orcid.org/0000-0003-3041-0185¹,
Randall J. Olsen^1,2,3,
Luchang Zhu¹,
Waleed Nasser¹,
Paul E. Bernard¹,
Concepcion C. Cantu¹,
Matthew Ojeda Saavedra¹,
María José Arredondo¹,
Benjamin Strope¹,
Hackwon Do¹,
Muthiah Kumaraswami¹,
Jaana Vuopio ORCID: orcid.org/0000-0002-0795-4822^4,5,
Kirsi Gröndahl-Yli-Hannuksela⁴,
Karl G. Kristinsson^6,7,
Magnus Gottfredsson ORCID: orcid.org/0000-0003-2465-0422^7,8,
Maiju Pesonen^9,10,
Johan Pensar⁹,
Emily R. Davenport¹¹,
Andrew G. Clark¹¹,
Jukka Corander^9,12,
Dominique A. Caugant¹³,
Shahin Gaini^14,15,16,17,
Marita Debess Magnussen^7,18,
Samantha L. Kubiak¹,
Hoang A. T. Nguyen¹,
S. Wesley Long¹,
Adeline R. Porter¹⁹,
Frank R. DeLeo¹⁹ &
…
James M. Musser ORCID: orcid.org/0000-0002-7765-4956^1,2,3

Nature Genetics volume 51, pages 548–559 (2019)Cite this article

7838 Accesses
45 Citations
122 Altmetric
Metrics details

Subjects

Abstract

Streptococcus pyogenes causes 700 million human infections annually worldwide, yet, despite a century of intensive effort, there is no licensed vaccine against this bacterium. Although a number of large-scale genomic studies of bacterial pathogens have been published, the relationships among the genome, transcriptome, and virulence in large bacterial populations remain poorly understood. We sequenced the genomes of 2,101 emm28 S. pyogenes invasive strains, from which we selected 492 phylogenetically diverse strains for transcriptome analysis and 50 strains for virulence assessment. Data integration provided a novel understanding of the virulence mechanisms of this model organism. Genome-wide association study, expression quantitative trait loci analysis, machine learning, and isogenic mutant strains identified and confirmed a one-nucleotide indel in an intergenic region that significantly alters global transcript profiles and ultimately virulence. The integrative strategy that we used is generally applicable to any microbe and may lead to new therapeutics for many human pathogens.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Population genetic structure for 2,095 *S. pyogenes emm28* invasive infection isolates.**

**Fig. 2: Transcriptome analysis of the subset of 50 strains.**

**Fig. 3: Singleton strains (n = 442) partition into two major transcriptome clusters according to their genome-wide expression profiles.**

**Fig. 4: Variation in the numbers of differentially expressed genes between cluster A and B strains.**

**Fig. 5: Clustering of *covR* and *covS* mutant strains, and associated virulence.**

**Fig. 6: An intergenic single-nucleotide insertion increases *Spy1336/R28* expression and strain virulence.**

**Fig. 7: Mouse virulence data, NADase production, and *nga* transcript levels.**

Mycobacterium abscessus pathogenesis identified by phenogenomic analyses

Article Open access 25 August 2022

Joint sequencing of human and pathogen genomes reveals the genetics of pneumococcal meningitis

Article Open access 15 May 2019

A bacterial pan-genome makes gene essentiality strain-dependent and evolvable

Article Open access 12 September 2022

Data availability

Whole-genome sequencing data for the 2,101 isolates studied have been deposited in the NCBI Sequence Read Archive under BioProject accession number PRJNA434389. The slightly updated complete genome sequence of the emm28 reference strain MGAS6180 (GenBank accession number CP000056) has been deposited in the NCBI GenBank database under the same accession number. Transcriptome data have been deposited in the Gene Expression Omnibus under accession GSE113058. The data that support the findings of this study are available from the corresponding author upon request.

References

Beres, S. B. et al. Transcriptome remodeling contributes to epidemic disease caused by the human pathogen Streptococcus pyogenes. mBio 7, e00403-16 (2016).
PubMed PubMed Central Google Scholar
Chewapreecha, C. et al. Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes. PLoS Genet. 10, e1004547 (2014).
PubMed PubMed Central Google Scholar
Fernandez-Romero, N. et al. Uncoupling between core genome and virulome in extraintestinal pathogenic Escherichia coli. Can. J. Microbiol. 61, 647–652 (2015).
CAS PubMed Google Scholar
Long, S. W. et al. Population genomic analysis of 1,777 extended-spectrum beta-lactamase-producing Klebsiella pneumoniae isolates, Houston, Texas: unexpected abundance of clonal group 307. mBio 8, e00489-17 (2017).
PubMed PubMed Central Google Scholar
Mukherjee, S. et al. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life. Nat. Biotechnol. 35, 676–683 (2017).
CAS PubMed Google Scholar
Nasser, W. et al. Evolutionary pathway to increased virulence and epidemic group A Streptococcus disease derived from 3,615 genome sequences. Proc. Natl Acad. Sci. USA 111, E1768–E1776 (2014).
CAS PubMed PubMed Central Google Scholar
Bruchmann, S. et al. Deep transcriptome profiling of clinical Klebsiella pneumoniae isolates reveals strain and sequence type-specific adaptation. Environ. Microbiol. 17, 4690–4710 (2015).
CAS PubMed Google Scholar
Dotsch, A. et al. The Pseudomonas aeruginosa transcriptional landscape is shaped by environmental heterogeneity and genetic variation. mBio 6, e00749 (2015).
PubMed PubMed Central Google Scholar
Sharma-Kuinkel, B. K. et al. Potential influence of Staphylococcus aureus clonal complex 30 genotype and transcriptome on hematogenous infections. Open Forum Infect. Dis. 2, ofv093 (2015).
PubMed PubMed Central Google Scholar
Felek, S., Tsang, T. M. & Krukonis, E. S. Three Yersinia pestis adhesins facilitate Yop delivery to eukaryotic cells and contribute to plague virulence. Infect. Immun. 78, 4134–4150 (2010).
CAS PubMed PubMed Central Google Scholar
Swearingen, M. C., Porwollik, S., Desai, P. T., McClelland, M. & Ahmer, B. M. Virulence of 32 Salmonella strains in mice. PLoS One 7, e36043 (2012).
CAS PubMed PubMed Central Google Scholar
Schreiber, H. L. T. et al. Bacterial virulence phenotypes of Escherichia coli and host susceptibility determine risk for urinary tract infections.Sci. Transl. Med. 9, eaaf1283 (2017).
PubMed PubMed Central Google Scholar
Carapetis, J. R., Steer, A. C., Mulholland, E. K. & Weber, M. The global burden of group A streptococcal diseases. Lancet Infect. Dis. 5, 685–694 (2005).
PubMed Google Scholar
Carapetis, J. R. et al. Acute rheumatic fever and rheumatic heart disease. Nat. Rev. Dis. Primers 2, 15084 (2016).
PubMed PubMed Central Google Scholar
Zhu, L. et al. A molecular trigger for intercontinental epidemics of group A Streptococcus. J. Clin. Invest. 125, 3545–3559 (2015).
PubMed PubMed Central Google Scholar
Zhu, L., Olsen, R. J., Nasser, W., de la Riva Morales, I. & Musser, J. M. Trading capsule for increased cytotoxin production: contribution to virulence of a newly emerged clade of emm89 Streptococcus pyogenes. mBio 6, e01378-15 (2015).
PubMed PubMed Central Google Scholar
Colman, G., Tanna, A., Efstratiou, A. & Gaworzewska, E. T. The serotypes of Streptococcus pyogenes present in Britain during 1980–1990 and their association with disease. J. Med. Microbiol. 39, 165–178 (1993).
CAS PubMed Google Scholar
Gherardi, G., Vitali, L. A. & Creti, R. Prevalent emm types among invasive GAS in Europe and North America since year 2000. Front. Public Health 6, 59 (2018).
PubMed PubMed Central Google Scholar
Smit, P. W. et al. Epidemiology and emm types of invasive group A streptococcal infections in Finland, 2008–2013. Eur. J. Clin. Microbiol. Infect. Dis. 34, 2131–2136 (2015).
CAS PubMed Google Scholar
Ikebe, T. et al. Increased prevalence of group A Streptococcus isolates in streptococcal toxic shock syndrome cases in Japan from 2010 to 2012. Epidemiol. Infect. 143, 864–872 (2015).
CAS PubMed Google Scholar
Naseer, U., Steinbakk, M., Blystad, H. & Caugant, D. A. Epidemiology of invasive group A streptococcal infections in Norway 2010–2014: a retrospective cohort study.Eur. J. Clin. Microbiol. Infect. Dis. 35, 1639–1648 (2016).
CAS PubMed Google Scholar
Nelson, G. E. et al. Epidemiology of invasive group A streptococcal infections in the United States, 2005–2012. Clin. Infect. Dis. 63, 478–486 (2016).
PubMed Google Scholar
Plainvert, C. et al. Invasive group A streptococcal infections in adults, France (2006–2010). Clin. Microbiol. Infect. 18, 702–710 (2012).
CAS PubMed Google Scholar
Al-Shahib, A. et al. Emergence of a novel lineage containing a prophage in emm/M3 group A Streptococcus associated with upsurge in invasive disease in the UK. Microb. Genom. 2, e000059 (2016).
PubMed PubMed Central Google Scholar
Davies, M. R. et al. Emergence of scarlet fever Streptococcus pyogenes emm12 clones in Hong Kong is associated with toxin acquisition and multidrug resistance. Nat. Genet. 47, 84–87 (2015).
CAS PubMed Google Scholar
Fittipaldi, N. et al. Full-genome dissection of an epidemic of severe invasive disease caused by a hypervirulent, recently emerged clone of group A Streptococcus. Am. J. Pathol. 180, 1522–1534 (2012).
CAS PubMed Google Scholar
Hamilton, S. M., Stevens, D. L. & Bryant, A. E. Pregnancy-related group a streptococcal infections: temporal relationships between bacterial acquisition, infection onset, clinical findings, and outcome. Clin. Infect. Dis. 57, 870–876 (2013).
PubMed PubMed Central Google Scholar
Johnson, D. R., Stevens, D. L. & Kaplan, E. L. Epidemiologic analysis of group A streptococcal serotypes associated with severe systemic infections, rheumatic fever, or uncomplicated pharyngitis. J. Infect. Dis. 166, 374–382 (1992).
CAS PubMed Google Scholar
Shea, P. R. et al. Group A Streptococcus emm gene types in pharyngeal isolates, Ontario, Canada, 2002–2010. Emerg. Infect. Dis. 17, 2010–2017 (2011).
PubMed PubMed Central Google Scholar
Smoot, J. C. et al. Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks. Proc. Natl Acad. Sci. USA 99, 4668–4673 (2002).
CAS PubMed PubMed Central Google Scholar
Ben Zakour, N. L., Venturini, C., Beatson, S. A. & Walker, M. J. Analysis of a Streptococcus pyogenes puerperal sepsis cluster by use of whole-genome sequencing. J. Clin. Microbiol. 50, 2224–2228 (2012).
CAS PubMed PubMed Central Google Scholar
Chuang, I., Van Beneden, C., Beall, B. & Schuchat, A. Population-based surveillance for postpartum invasive group A Streptococcus infections, 1995–2000. Clin. Infect. Dis. 35, 665–670 (2002).
PubMed Google Scholar
Gaworzewska, E. & Colman, G. Changes in the pattern of infection caused by Streptococcus pyogenes. Epidemiol. Infect. 100, 257–269 (1988).
CAS PubMed PubMed Central Google Scholar
Raymond, J., Schlegel, L., Garnier, F. & Bouvet, A. Molecular characterization of Streptococcus pyogenes isolates to investigate an outbreak of puerperal sepsis. Infect. Control Hosp. Epidemiol. 26, 455–461 (2005).
PubMed Google Scholar
Sims, D., Sudbery, I., Ilott, N. E., Heger, A. & Ponting, C. P. Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet. 15, 121–132 (2014).
CAS PubMed Google Scholar
Bricker, A. L., Carey, V. J. & Wessels, M. R. Role of NADase in virulence in experimental invasive group A streptococcal infection. Infect. Immun. 73, 6562–6566 (2005).
CAS PubMed PubMed Central Google Scholar
Bricker, A. L., Cywes, C., Ashbaugh, C. D. & Wessels, M. R. NAD⁺-glycohydrolase acts as an intracellular toxin to enhance the extracellular survival of group A streptococci. Mol. Microbiol. 44, 257–269 (2002).
CAS PubMed Google Scholar
Sumby, P. et al. Evolutionary origin and emergence of a highly successful clone of serotype M1 group A Streptococcus involved multiple horizontal gene transfer events. J. Infect. Dis. 192, 771–782 (2005).
CAS PubMed Google Scholar
Zhu, L. et al. Contribution of secreted NADase and streptolysin O to the pathogenesis of epidemic serotype M1 Streptococcus pyogenes infections. Am. J. Pathol. 187, 605–613 (2017).
CAS PubMed PubMed Central Google Scholar
Meehl, M. A., Pinkner, J. S., Anderson, P. J., Hultgren, S. J. & Caparon, M. G. A novel endogenous inhibitor of the secreted streptococcal NAD-glycohydrolase. PLoS Pathog. 1, e35 (2005).
PubMed PubMed Central Google Scholar
Tatsuno, I. et al. Characterization of the NAD-glycohydrolase in streptococcal strains. Microbiology 153, 4253–4260 (2007).
CAS PubMed Google Scholar
Shimomura, Y. et al. Complete genome sequencing and analysis of a Lancefield group G Streptococcus dysgalactiae subsp. equisimilis strain causing streptococcal toxic shock syndrome (STSS). BMC Genomics 12, 17 (2011).
CAS PubMed PubMed Central Google Scholar
Carroll, R. K. et al. Naturally occurring single amino acid replacements in a regulatory protein alter streptococcal gene expression and virulence in mice. J. Clin. Invest. 121, 1956–1968 (2011).
CAS PubMed PubMed Central Google Scholar
Graham, M. R. et al. Virulence control in group A Streptococcus by a two-component gene regulatory system: global expression profiling and in vivo infection modeling. Proc. Natl Acad. Sci. USA 99, 13855–13860 (2002).
CAS PubMed PubMed Central Google Scholar
Ribardo, D. A. & McIver, K. S. Defining the Mga regulon: comparative transcriptome analysis reveals both direct and indirect regulation by Mga in the group A Streptococcus. Mol. Microbiol. 62, 491–508 (2006).
CAS PubMed Google Scholar
Ramalinga, A., Danger, J. L., Makthal, N., Kumaraswami, M. & Sumby, P. Multimerization of the virulence-enhancing group A Streptococcus transcription factor RivR is required for regulatory activity.J. Bacteriol. 199, e00452-16 (2017).
PubMed Google Scholar
Trevino, J., Liu, Z., Cao, T. N., Ramirez-Pena, E. & Sumby, P. RivR is a negative regulator of virulence factor expression in group A Streptococcus. Infect. Immun. 81, 364–372 (2013).
CAS PubMed PubMed Central Google Scholar
Nyberg, P., Rasmussen, M. & Bjorck, L. α₂-Macroglobulin-proteinase complexes protect Streptococcus pyogenes from killing by the antimicrobial peptide LL-37. J. Biol. Chem. 279, 52820–52823 (2004).
CAS PubMed Google Scholar
Rasmussen, M., Muller, H. P. & Bjorck, L. Protein GRAB of Streptococcus pyogenes regulates proteolysis at the bacterial surface by binding α₂-macroglobulin. J. Biol. Chem. 274, 15336–15344 (1999).
CAS PubMed Google Scholar
Toppel, A. W., Rasmussen, M., Rohde, M., Medina, E. & Chhatwal, G. S. Contribution of protein G-related α₂-macroglobulin-binding protein to bacterial virulence in a mouse skin model of group A streptococcal infection. J. Infect. Dis. 187, 1694–1703 (2003).
CAS PubMed Google Scholar
Haas, B. J., Chin, M., Nusbaum, C., Birren, B. W. & Livny, J. How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? BMC Genomics 13, 734 (2012).
CAS PubMed PubMed Central Google Scholar
Shishkin, A. A. et al. Simultaneous generation of many RNA-Seq libraries in a single reaction. Nat. Methods 12, 323–325 (2015).
CAS PubMed PubMed Central Google Scholar
Engleberg, N. C., Heath, A., Miller, A., Rivera, C. & DiRita, V. J. Spontaneous mutations in the CsrRS two-component regulatory system of Streptococcus pyogenes result in enhanced virulence in a murine model of skin and soft tissue infection. J. Infect. Dis. 183, 1043–1054 (2001).
CAS PubMed Google Scholar
Li, J. et al. Neutrophils select hypervirulent CovRS mutants of M1T1 group A Streptococcus during subcutaneous infection of mice. Infect. Immun. 82, 1579–1590 (2014).
PubMed PubMed Central Google Scholar
Mayfield, J. A. et al. Mutations in the control of virulence sensor gene from Streptococcus pyogenes after infection in mice lead to clonal bacterial variants with altered gene regulatory activity and virulence. PLoS One 9, e100698 (2014).
PubMed PubMed Central Google Scholar
Sumby, P., Whitney, A. R., Graviss, E. A., DeLeo, F. R. & Musser, J. M. Genome-wide analysis of group A streptococci reveals a mutation that modulates global phenotype and disease specificity. PLoS Pathog. 2, e5 (2006).
PubMed PubMed Central Google Scholar
Tatsuno, I., Okada, R., Zhang, Y., Isaka, M. & Hasegawa, T. Partial loss of CovS function in Streptococcus pyogenes causes severe invasive disease. BMC Res. Notes 6, 126 (2013).
CAS PubMed PubMed Central Google Scholar
Trevino, J. et al. CovS simultaneously activates and inhibits the CovR-mediated repression of distinct subsets of group A Streptococcus virulence factor-encoding genes. Infect. Immun. 77, 3141–3149 (2009).
CAS PubMed PubMed Central Google Scholar
Breiman, L. Random Forests. Mach. Learn. 45, 5–32 (2001).
Google Scholar
Stalhammar-Carlemalm, M., Areschoug, T., Larsson, C. & Lindahl, G. The R28 protein of Streptococcus pyogenes is related to several group B streptococcal surface proteins, confers protective immunity and promotes binding to human epithelial cells. Mol. Microbiol. 33, 208–219 (1999).
CAS PubMed Google Scholar
Stalhammar-Carlemalm, M., Stenberg, L. & Lindahl, G. Protein rib: a novel group B streptococcal cell surface protein that confers protective immunity and is expressed by most strains causing invasive infections. J. Exp. Med. 177, 1593–1603 (1993).
CAS PubMed Google Scholar
Beres, S. B. & Musser, J. M. Contribution of exogenous genetic elements to the group A Streptococcus metagenome. PLoS One 2, e800 (2007).
PubMed PubMed Central Google Scholar
Green, N. M. et al. Genome sequence of a serotype M28 strain of group A Streptococcus: potential new insights into puerperal sepsis and bacterial disease specificity. J. Infect. Dis. 192, 760–770 (2005).
CAS PubMed Google Scholar
Coll, F. et al. Genome-wide analysis of multi- and extensively drug-resistant Mycobacterium tuberculosis. Nat. Genet. 50, 307–316 (2018).
PubMed Google Scholar
Earle, S. G. et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat. Microbiol. 1, 16041 (2016).
CAS PubMed PubMed Central Google Scholar
Gibson, G., Powell, J. E. & Marigorta, U. M. Expression quantitative trait locus analysis for translational medicine. Genome Med. 7, 60 (2015).
PubMed PubMed Central Google Scholar
Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
PubMed PubMed Central Google Scholar
Olsen, R. J. & Musser, J. M. Molecular pathogenesis of necrotizing fasciitis. Annu. Rev. Pathol. 5, 1–31 (2010).
CAS PubMed Google Scholar
Rodriguez-Ortega, M. J. et al. Characterization and identification of vaccine candidate proteins through analysis of the group A Streptococcus surface proteome. Nat. Biotechnol. 24, 191–197 (2006).
CAS PubMed Google Scholar
Zhu, L. et al. Intergenic variable-number tandem-repeat polymorphism upstream of rocA alters toxin production and enhances virulence in Streptococcus pyogenes. Infect. Immun. 84, 2086–2093 (2016).
CAS PubMed PubMed Central Google Scholar
Hammarlof, D. L. et al. Role of a single noncoding nucleotide in the evolution of an epidemic African clade of Salmonella.Proc. Natl Acad. Sci. USA 115, E2614–E2623 (2018).
PubMed PubMed Central Google Scholar
Blount, Z. D., Barrick, J. E., Davidson, C. J. & Lenski, R. E. Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature 489, 513–518 (2012).
CAS PubMed PubMed Central Google Scholar
Zaunbrecher, M. A., Sikes, R. D. Jr, Metchock, B., Shinnick, T. M. & Posey, J. E. Overexpression of the chromosomally encoded aminoglycoside acetyltransferase eis confers kanamycin resistance in Mycobacterium tuberculosis. Proc. Natl Acad. Sci. USA 106, 20004–20009 (2009).
CAS PubMed PubMed Central Google Scholar
Puopolo, K. M. & Madoff, L. C. Upstream short sequence repeats regulate expression of the alpha C protein of group B Streptococcus. Mol. Microbiol. 50, 977–991 (2003).
CAS PubMed Google Scholar
Stalhammar-Carlemalm, M., Areschoug, T., Larsson, C. & Lindahl, G. Cross-protection between group A and group B streptococci due to cross-reacting surface proteins. J. Infect. Dis. 182, 142–149 (2000).
CAS PubMed Google Scholar
Weckel, A. et al. The N-terminal domain of the R28 protein promotes emm28 group A Streptococcus adhesion to host cells via direct binding to three integrins. J. Biol. Chem. 293, 16006–16018 (2018).
CAS PubMed PubMed Central Google Scholar
Valdes, K. M. et al. The fruRBA operon is necessary for group A streptococcal growth in fructose and for resistance to neutrophil killing during growth in whole human blood. Infect. Immun. 84, 1016–1031 (2016).
CAS PubMed PubMed Central Google Scholar
Jeukens, J. et al. Genomics of antibiotic-resistance prediction in Pseudomonas aeruginosa.Ann. NY Acad. Sci. 1435, 5–17 (2017).
PubMed Google Scholar
Nguyen, M. et al. Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae. Sci. Rep. 8, 421 (2018).
PubMed PubMed Central Google Scholar
Pesesky, M. W. et al. Evaluation of machine learning and rules-based approaches for predicting antimicrobial resistance profiles in Gram-negative bacilli from whole genome sequence data. Front. Microbiol. 7, 1887 (2016).
PubMed PubMed Central Google Scholar
Rishishwar, L., Petit, R. A. 3rd, Kraft, C. S. & Jordan, I. K. Genome sequence-based discriminator for vancomycin-intermediate Staphylococcus aureus. J. Bacteriol. 196, 940–948 (2014).
PubMed PubMed Central Google Scholar
Li, Y. et al. Validation of beta-lactam minimum inhibitory concentration predictions for pneumococcal isolates with newly encountered penicillin binding protein (PBP) sequences. BMC Genomics 18, 621 (2017).
PubMed PubMed Central Google Scholar
Li, Y. et al. Penicillin-binding protein transpeptidase signatures for tracking and predicting beta-lactam resistance levels in Streptococcus pneumoniae. mBio 7, e00756-16 (2016).
PubMed PubMed Central Google Scholar
Hao, K. et al. Lung eQTLs to help reveal the molecular underpinnings of asthma. PLoS Genet. 8, e1003029 (2012).
CAS PubMed PubMed Central Google Scholar
Naranbhai, V. et al. Genomic modulators of gene expression in human neutrophils. Nat. Commun. 6, 7545 (2015).
PubMed Google Scholar
Ongen, H. et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 49, 1676–1683 (2017).
CAS PubMed Google Scholar
Tung, J., Zhou, X., Alberts, S. C., Stephens, M. & Gilad, Y. The genetic architecture of gene expression levels in wild baboons. eLife https://doi.org/10.7554/eLife.04729.001 (2015).
Albert, F. W., Treusch, S., Shockley, A. H., Bloom, J. S. & Kruglyak, L. Genetics of single-cell protein abundance variation in large yeast populations. Nature 506, 494–497 (2014).
CAS PubMed PubMed Central Google Scholar
Parker, C. C. et al. Genome-wide association study of behavioral, physiological and gene expression traits in outbred CFW mice. Nat. Genet. 48, 919–926 (2016).
CAS PubMed PubMed Central Google Scholar
Francesconi, M. & Lehner, B. The effects of genetic variation on gene expression dynamics during development. Nature 505, 208–211 (2014).
CAS PubMed Google Scholar
Beres, S. B. et al. Molecular complexity of successive bacterial epidemics deconvoluted by comparative pathogenomics. Proc. Natl Acad. Sci. USA 107, 4371–4376 (2010).
CAS PubMed PubMed Central Google Scholar
Olsen, R. J. et al. The majority of 9,729 group A Streptococcus strains causing disease secrete SpeB cysteine protease: pathogenesis implications. Infect. Immun. 83, 4750–4758 (2015).
CAS PubMed PubMed Central Google Scholar
Beres, S. B. et al. Genome sequence analysis of emm89 Streptococcus pyogenes strains causing infections in Scotland, 2010–2016. J. Med. Microbiol. 66, 1765–1773 (2017).
CAS PubMed PubMed Central Google Scholar
Liu, Y., Schroder, J. & Schmidt, B. Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics 29, 308–315 (2013).
CAS PubMed Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
PubMed PubMed Central Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
CAS PubMed PubMed Central Google Scholar
Inouye, M. et al. SRST2: rapid genomic surveillance for public health and hospital microbiology labs. Genome Med. 6, 90 (2014).
PubMed PubMed Central Google Scholar
Croucher, N. J. et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 43, e15 (2015).
PubMed Google Scholar
Cheng, L., Connor, T. R., Siren, J., Aanensen, D. M. & Corander, J. Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol. Biol. Evol. 30, 1224–1228 (2013).
CAS PubMed PubMed Central Google Scholar
Huson, D. H. SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68–73 (1998).
CAS PubMed Google Scholar
Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput. Biol. 13, e1005595 (2017).
PubMed PubMed Central Google Scholar
Long, S. W., Kachroo, P., Musser, J. M. & Olsen, R. J. Whole-genome sequencing of a human clinical isolate of emm28 Streptococcus pyogenes causing necrotizing fasciitis acquired contemporaneously with Hurricane Harvey.Genome Announc. 5, e01269-17 (2017).
PubMed PubMed Central Google Scholar
Lees, J. A. et al. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat. Commun. 7, 12797 (2016).
CAS PubMed PubMed Central Google Scholar
Bishop, C. Pattern Recognition and Machine Learning (Springer, New York, 2006).
Eraso, J. M. et al. Genomic landscape of intrahost variation in group A Streptococcus: repeated and abundant mutational inactivation of the fabT gene encoding a regulator of fatty acid synthesis. Infect. Immun. 84, 3268–3281 (2016).
CAS PubMed PubMed Central Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
CAS PubMed PubMed Central Google Scholar
Magoc, T., Wood, D. & Salzberg, S. L. EDGE-pro: estimated degree of gene expression in prokaryotic genomes. Evol. Bioinform. Online 9, 127–136 (2013).
PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome. Biol. 15, 550 (2014).
PubMed PubMed Central Google Scholar
Hoffman, G. E. & Schadt, E. E. variancePartition: interpreting drivers of variation in complex gene expression studies. BMC Bioinformatics 17, 483 (2016).
PubMed PubMed Central Google Scholar
Tarazona, S. et al. Data quality aware analysis of differential expression in RNA-Seq with NOISeq R/Bioc package. Nucleic Acids Res. 43, e140 (2015).
PubMed PubMed Central Google Scholar
Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome. Biol. 17, 132 (2016).
PubMed PubMed Central Google Scholar
Shabalin, A. A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012).
CAS PubMed PubMed Central Google Scholar
Committee for the Update of the Guide for the Care and Use of Laboratory Animals, Institute for Laboratory Animal Research & Division on Earth and Life Studies Guide for the Care and Use of Laboratory Animals 8th edn. (National Academies Press, Washington, DC, 2011).

Download references

Acknowledgements

This study was supported in part by the Fondren Foundation, Houston Methodist Hospital and Research Institute (to J.M.M.), the Academy of Finland (grant 255636 to J.V.), a European Research Council grant (number 742158 to J.C.), and a National Institutes of Health grant (1R01AI109096-01A1 to M.K.). This research was also supported in part by the Intramural Research Program of the National Institute of Allergy and Infectious Disease, National Institutes of Health (to F.R.D.). We thank N. Copeland, N. Jenkins, and D. Ginsburg for critical comments and suggestions to improve the manuscript; K. Stockbauer for critical comments and editorial assistance; E. Graviss, H. Erlendsdottir, W. Hong, and S. Linson for technical assistance; H.-L. Hyyryläinen, J. Jalava, and the Finnish clinical microbiology laboratories; A. A. Shishkin for helpful suggestions regarding the RNAtag-seq protocol; M. Todorovic and J. Jonsdottir Nielsen for banking strains from the Faroe Islands; A. McGeer for Ontario strains; C. Van Beneden, B. Beall, and the Active Bacterial Core Surveillance of the CDC’s Emerging Infections Programs network; A. Ramstad Alme and A. Witsø for technical assistance; and M. Steinbakk (Norwegian Laboratory for Streptococci) for support.

Author information

These authors contributed equally: Priyanka Kachroo, Jesus M. Eraso.

Authors and Affiliations

Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, TX, USA
Priyanka Kachroo, Jesus M. Eraso, Stephen B. Beres, Randall J. Olsen, Luchang Zhu, Waleed Nasser, Paul E. Bernard, Concepcion C. Cantu, Matthew Ojeda Saavedra, María José Arredondo, Benjamin Strope, Hackwon Do, Muthiah Kumaraswami, Samantha L. Kubiak, Hoang A. T. Nguyen, S. Wesley Long & James M. Musser
Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, NY, USA
Randall J. Olsen & James M. Musser
Department of Microbiology and Immunology, Weill Cornell Medical College, New York, NY, USA
Randall J. Olsen & James M. Musser
Institute of Biomedicine, Medical Microbiology and Immunology, University of Turku, Turku, Finland
Jaana Vuopio & Kirsi Gröndahl-Yli-Hannuksela
National Institute for Health and Welfare, Helsinki, Finland
Jaana Vuopio
Department of Clinical Microbiology, Landspitali University Hospital, Reykjavik, Iceland
Karl G. Kristinsson
Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
Karl G. Kristinsson, Magnus Gottfredsson & Marita Debess Magnussen
Department of Infectious Diseases, Landspitali University Hospital, Reykjavik, Iceland
Magnus Gottfredsson
Helsinki Institute of Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
Maiju Pesonen, Johan Pensar & Jukka Corander
Department of Computer Science, Aalto University, Espoo, Finland
Maiju Pesonen
Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
Emily R. Davenport & Andrew G. Clark
Department of Biostatistics, University of Oslo, Oslo, Norway
Jukka Corander
Division for Infection Control and Environmental Health, Norwegian Institute of Public Health, Oslo, Norway
Dominique A. Caugant
Medical Department, Infectious Diseases Division, National Hospital of the Faroe Islands, Tórshavn, Denmark
Shahin Gaini
Department of Infectious Diseases, Odense University Hospital, Odense, Denmark
Shahin Gaini
Department of Clinical Research, University of Southern Denmark, Odense, Denmark
Shahin Gaini
Department of Science and Technology, Centre of Health Research, University of the Faroe Islands, Tórshavn, Denmark
Shahin Gaini
Thetis, Food and Environmental Laboratory, Torshavn, Denmark
Marita Debess Magnussen
Laboratory of Bacteriology, Rocky Mountain Laboratories, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT, USA
Adeline R. Porter & Frank R. DeLeo

Authors

Priyanka Kachroo
View author publications
You can also search for this author in PubMed Google Scholar
Jesus M. Eraso
View author publications
You can also search for this author in PubMed Google Scholar
Stephen B. Beres
View author publications
You can also search for this author in PubMed Google Scholar
Randall J. Olsen
View author publications
You can also search for this author in PubMed Google Scholar
Luchang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Waleed Nasser
View author publications
You can also search for this author in PubMed Google Scholar
Paul E. Bernard
View author publications
You can also search for this author in PubMed Google Scholar
Concepcion C. Cantu
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Ojeda Saavedra
View author publications
You can also search for this author in PubMed Google Scholar
María José Arredondo
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Strope
View author publications
You can also search for this author in PubMed Google Scholar
Hackwon Do
View author publications
You can also search for this author in PubMed Google Scholar
Muthiah Kumaraswami
View author publications
You can also search for this author in PubMed Google Scholar
Jaana Vuopio
View author publications
You can also search for this author in PubMed Google Scholar
Kirsi Gröndahl-Yli-Hannuksela
View author publications
You can also search for this author in PubMed Google Scholar
Karl G. Kristinsson
View author publications
You can also search for this author in PubMed Google Scholar
Magnus Gottfredsson
View author publications
You can also search for this author in PubMed Google Scholar
Maiju Pesonen
View author publications
You can also search for this author in PubMed Google Scholar
Johan Pensar
View author publications
You can also search for this author in PubMed Google Scholar
Emily R. Davenport
View author publications
You can also search for this author in PubMed Google Scholar
Andrew G. Clark
View author publications
You can also search for this author in PubMed Google Scholar
Jukka Corander
View author publications
You can also search for this author in PubMed Google Scholar
Dominique A. Caugant
View author publications
You can also search for this author in PubMed Google Scholar
Shahin Gaini
View author publications
You can also search for this author in PubMed Google Scholar
Marita Debess Magnussen
View author publications
You can also search for this author in PubMed Google Scholar
Samantha L. Kubiak
View author publications
You can also search for this author in PubMed Google Scholar
Hoang A. T. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
S. Wesley Long
View author publications
You can also search for this author in PubMed Google Scholar
Adeline R. Porter
View author publications
You can also search for this author in PubMed Google Scholar
Frank R. DeLeo
View author publications
You can also search for this author in PubMed Google Scholar
James M. Musser
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.M.M. conceptualized the study. P.K., J.M.E., and J.M.M. designed the study. P.K., J.M.E., S.B.B., R.J.O., L.Z., W.N., P.E.B., C.C.C., M.O.S., M.J.A., B.S., M.P., J.P., J.C., S.L.K., H.A.T.N., S.W.L., and A.R.P. produced the data. P.K., J.M.E., S.B.B., R.J.O., L.Z., H.D., M.K., M.P., J.P., J.C., S.W.L., and F.R.D. analyzed the data. P.K. led the analyses of the transcriptome data. M.P., J.P., E.R.D., A.G.C., and J.C. provided scholarly input on the statistical analysis and presentation strategies. J.V., K.G.-Y.-H., K.G.K., M.G., D.A.C., S.G., and M.D.M. provided strains and metadata. All authors contributed to writing the manuscript. All authors reviewed and approved the final draft. P.K. and J.M.E. contributed equally to this work, as did S.B.B., R.J.O., and L.Z.

Corresponding author

Correspondence to James M. Musser.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Distribution of emm28 isolates by country and state in the United States.

All strains were isolated during a 26-year period, spanning 1991 through 2016. (a) Distribution of strains by country. Vertical black bars indicate the number of isolates per year. The total number of strains isolated in the USA was 952, of which 951 strains were collected as part of the Active Bacterial Core (ABC) surveillance study conducted by the Centers for Disease Control and Prevention^{32,68,111,112} (see https://www.cdc.gov/abcs/index.html for a complete description of the study). The one additional strain (from Texas) is strain MGAS6180, which is the genome sequence reference strain. Canadian strains are all from Ontario. The Faroe Islands are a self-governing part of Denmark. Regardless of country, all strains were recovered as part of comprehensive, population-based studies. (b) Distribution of emm28 isolates by state in the USA. All strains were isolated during a period of 18 years, spanning 1995 through 2012. Vertical black bars indicate the number of isolates per year. For the U.S. isolates, the states have been coded (A-J) at the request of the Centers for Disease Control.

Supplementary Figure 2 Flowcharts depicting bacterial genome and transcriptome data analysis.

(a) Next-generation sequencing data analysis pipeline employed for the preprocessing, read mapping, variant discovery and downstream genomic analyses of whole-genome sequencing data. ^aMLST: Multilocus sequence type, ^bSNP: Single nucleotide polymorphism, ^cHGT: Horizontal gene transfer. (b) Bioinformatics pipeline for demultiplexing, quality assessment, adapter trimming, read mapping, and data normalization and differential expression of transcriptome data.

Supplementary Figure 3 Distribution of emm28 isolates by genetic subclade, country and year.

Strains are represented by country and year of isolation. Only strains belonging to subclades 1A (SC1A-red), 1B (SC1B-blue), 2A (SC2A-green), and 2B (SC2B-brown) are shown. (a) Vertical bars indicate the number of isolates per year. The number (n) of strains isolated in each country is shown. Six distant outlier strains in the phylogenetic tree and 7 strains from the Faroe Islands are not shown. Thus, the number of strains does not sum to the total sample of 2,101 strains. No strains belonging to subclade SC2A or SC2B were isolated in Iceland. (b) Total number of strains belonging to each individual subclade per country. US, United States; CA, Canada (Ontario); FI, Finland; NO, Norway; IS, Iceland. Others refers to 6 distant outlier strains in the phylogenetic tree.

Supplementary Figure 4 Correlation among biological replicates for 50 strains analyzed by RNA-seq.

Comparison of biological replicates per strain at mid-exponential (a) and early-stationary phase (b). Mean correlation coefficient (Pearson) and standard deviation of normalized and log-transformed transcript counts for three biological replicates per strain are plotted.

Supplementary Figure 5 Transcriptome alterations and genetic subclades.

(a) Schematic depicts number of differentially expressed (DE) genes obtained by comparing transcriptome data for strains in the three major genetic subclades at mid-exponential (ME) and early-stationary (ES) phases. (b) Fold-increase in nga-ifs-slo transcript levels in SC2A (n = 15) strains compared to SC1A (n = 12) and SC1B (n = 23) strains at ME and ES phase. (c) grab gene transcript levels (normalized counts) were significantly increased in SC1B (n = 23) strains compared to SC1A (n = 12) strains at both growth phases (ME and ES). A significant increase in grab transcript levels in SC2A (n = 15) strains compared to SC1A (n = 12) strains was observed at ES phase. Statistical tests were performed using Mann-Whitney (two-tailed) test. Data are presented as box and whisker plots, where whiskers represent the minimum and maximum values. n represents the number of strains; each strain has three independent biological replicates.

Supplementary Figure 6 Comparison of three replicates versus single replicate and RNA-seq versus RNAtaq-seq.

(a) Scatterplots comparing WT-like strains from each of three major subclades (10 SC1A, 22 SC1B, 14 SC2A) using triplicates versus one randomly selected replicate from the 50-strain data. Presence of three biological replicates in the 50-strain data allowed us to simulate comparisons of averaged normalized counts when three versus one replicate were used. Strong correlation (r = 0.99) was observed for each triplicate- versus single-replicate comparison. Pearson correlation coefficient (r) is shown for each comparison. n represents number of samples (number of strains multiplied by number of replicates). (b) Seven strains were processed using the two protocols, that is, RNA-seq (three biological replicates per strain) and RNAtag-seq (singletons, that is, using single replicates). Principal component analysis of the seven strains processed using RNA-seq (three spheres colored cyan in the PCA plot) and RNAtag-seq (single sphere colored red in the PCA plot) displays overlapping spatial clustering. Expression profile of the 7 strains in the PCA plot is circled and numbered 1 through 7. Strains analyzed: 1-MGAS7888, 2-MGAS29284, 3-MGAS29553, 4-MGAS28746, 5-MGAS7914, 6-MGAS28647, and 7-MGAS28686. (c) Scatterplots were generated for the normalized counts (log-transformed) from the aforementioned seven strains processed using the two protocols, that is, RNA-seq and RNAtag-seq. For each strain, normalized transcript counts were averaged over the three biological replicates (RNA-seq protocol) and compared to RNAtag-seq normalized counts (singleton strain samples). Pearson correlation coefficient (r) is shown for each comparison.

Supplementary Figure 7 Strategy used to make pools and superpools and their sequencing read content (millions).

(a) Strategy used to make pools and superpools. Strains (small yellow circles) were grouped to form 58 distinct pools (gray circles) by labeling total RNA extracted from each strain with unique barcoded oligoribonucleotides. RNA from 8 strains was mixed to create one pool, with the exception of pool 58, which contained RNA from only 5 strains. In total there were 58 pools. cDNAs from each pool were individually barcoded with Illumina P7 index oligonucleotides. Four different P7 oligonucleotides were used in this study. Four pools were mixed to form one superpool (large yellow circles). In total there were 15 superpools. Pool 58 contained cDNA from only five strains, and superpool 15 contained only two pools. The original number of strains we performed RNAtag-seq analysis on was 461, and here we present data for 442. Data from 19 strains were not included because of low sequence coverage. (b) Average number of sequence reads per pool for each of the 15 superpools is presented. Each circle represents mean and error bars represent standard deviation (SD). Median was calculated using data for superpools 1–14 (each comprised of four pools). Superpool 15 contained only two pools. (c) Graph depicts the median number of reads per sample per pool in millions. Median reads per sample for the pools 57 and 58 are larger due to the higher sequencing depth of these pools.

Supplementary Figure 8 PCA plot of singleton strains and analysis of Cluster A ropB mutant strains.

(a) The two major clusters identified by DBSCAN are shown. (b) No subclade-specific clustering was evident within the two clusters. (c) Twenty strains with ropB mutations are outliers (colored yellow) and group away from the other strains with ropB mutations (colored orange). ropB-non-outlier strains cluster with WT-like strains (colored light blue) and strains with mutations in other major regulator genes (colored blue). (d) Cluster A ropB mutant strains separated into two groups validated by k-means clustering and were designated arbitrarily as Group I and Group II. (e) Group II ropB mutant strains had significantly decreased speB transcript levels compared to Group I strains (Mann-Whitney, two-tailed, P < 0.0001). (f) Mutations were mapped onto the crystal structure of the C-terminal region of the RopB protein. Variant amino acid positions associated with Group I or Group II organisms are labeled in red and pink, respectively. Amino acid residues present in inferred functional domains are demarcated with ovals. Mutations located in RopB functional domains were present at significantly increased frequency (test of proportions-one-tailed, P < 0.05) in Group II strains (pink labels within ovals) compared to Group I strains (red labels within ovals). PBD: peptide binding domain, NTD: N-terminal domain. The crystal structure of the NTD has not been solved. (g) Kaplan-Meier curve showing that the Group I (n = 3) and Group II (n = 4) strains differ significantly (log-rank test) in virulence in a mouse necrotizing myositis infection model (40 mice per strain). (h) Gross pathology images of infected mouse hindlimbs (n = 5 mice per strain) reflect the difference in virulence between the Group I (top) and Group II (bottom) strains, and representative images are displayed. Boxed areas demarcated in white illustrate major lesion areas.

Supplementary Figure 9 Lack of significant relationship between extent of transcriptome remodeling (number of DE genes) and genetic distance.

(a) Scatterplot comparing the number of differentially expressed genes (DE) and the genetic distance of the 442 singleton strains. For each of the strains, genes were called differentially expressed compared to reference strain MGAS28737. Genetic distance was measured as the number of core chromosomal SNPs compared to strain MGAS28737. Red line represents the line of regression. No significant correlation was observed between genetic distance and extent of transcriptome remodeling (number of DE genes) with R² value of 0.0046. (b) No improvement in correlation (R² = 0.0040) was observed when the analysis was conducted using only data for the 188 strains that have wild-type alleles for all known major regulatory genes. Red, SC1A; blue, SC1B; green, SC2A; yellow, SC2B. R² value was calculated by linear regression analysis.

Supplementary Figure 10 Genome-wide association analysis and eQTL analysis of 442 strains

(a) Genome-wide association analysis was performed on 442 strains. Manhattan plot showing statistical significance (y-axis) of each k-mer (red circles) positively associated with high transcript expression of genes Spy1336/R28 and Spy1337, and their position along the 1.9 Mb GAS genome. Significant k-mers mapped to only one region of the chromosome, corresponding to the intergenic region between the Spy1336/R28 and Spy1337 genes. The top part is a schematic of the GAS genome, with vertical blue lines corresponding to open reading frames (ORFs) encoded by each strand of the chromosome. The bottom part shows an enlargement of the genome location corresponding to Spy1336/R28 and Spy1337, and the intergenic region. P values were computed by SEER software (Methods) (b) eQTL analysis identifies significant association between genotype (9T versus 10T) and expression level of genes Spy1336/R28 and Spy1337 in 50 strains at mid-exponential phase (left panel) and in 442 strains at early-stationary phase (right panel). Horizontal black bars represent mean transcript expression and standard deviation. PeQTL refers to q-values (False discovery rate, FDR) as reported by MatrixEQTL package. The threshold used for genome-wide significance was adjusted P value < 10e-8.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10, Supplementary Note and Supplementary Tables 1, 7, 9, 10, 13 and 17–19

Reporting Summary

Supplementary Table 2

SNPs largely present in SC2A but absent in SC1A post Gubbins

Supplementary Table 3

Inferred MGE content for the 20 most prevalent MGE genotypes in the S. pyogenes emm28 cohort

Supplementary Table 4

MGE genotype based on the presence or absence of 50 phage and ICE encoded genes, 31 integrases and 19 secreted virulence factors, derived from MGEs identified in 60 complete S. pyogenes

Supplementary Table 5

SRST2 MGE-50 absence/presence matrix and genotype

Supplementary Table 6

Supplementary Table 8

List of differentially expressed genes comparing the three major genetic subclades at midexponential and stationary phase

Supplementary Table 11

Regulatory gene mutation prediction by machine learning

Supplementary Table 12

List of differentially expressed genes comparing transcriptomic clusters within CovR/CovS mutant strains

Supplementary Table 14

List of differentially expressed genes between group II versus group I ropB mutant strains

Supplementary Table 15

List of differentially expressed genes comparing the isogenic strains with either 9Ts or 10Ts in the intergenic region between the Spy1336/R28 and Spy1337 genes

Supplementary Table 16

Results of eQTL analysis

Supplementary Table 20

Data quality metrics for the 2101 emm28 cohort

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kachroo, P., Eraso, J.M., Beres, S.B. et al. Integrated analysis of population genomics, transcriptomics and virulence provides novel insights into Streptococcus pyogenes pathogenesis. Nat Genet 51, 548–559 (2019). https://doi.org/10.1038/s41588-018-0343-1

Download citation

Received: 12 April 2018
Accepted: 21 December 2018
Published: 18 February 2019
Issue Date: March 2019
DOI: https://doi.org/10.1038/s41588-018-0343-1

This article is cited by

Neutrophil-derived reactive agents induce a transient SpeB negative phenotype in Streptococcus pyogenes
- Patience Shumba
- Thomas Sura
- Nikolai Siemens
Journal of Biomedical Science (2023)
Next-generation microbiology: from comparative genomics to gene function
- Carolin M. Kobras
- Andrew K. Fenton
- Samuel K. Sheppard
Genome Biology (2021)
To be capsulated or not be capsulated: that is the GAS question
- Roberta Creti
- Giovanni Gherardi
- Monica Imperi
European Journal of Clinical Microbiology & Infectious Diseases (2019)