Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Identifying lineage effects when controlling for population structure improves power in bacterial association studies

Abstract

Bacteria pose unique challenges for genome-wide association studies because of strong structuring into distinct strains and substantial linkage disequilibrium across the genome1,2. Although methods developed for human studies can correct for strain structure3,4, this risks considerable loss-of-power because genetic differences between strains often contribute substantial phenotypic variability5. Here, we propose a new method that captures lineage-level associations even when locus-specific associations cannot be fine-mapped. We demonstrate its ability to detect genes and genetic variants underlying resistance to 17 antimicrobials in 3,144 isolates from four taxonomically diverse clonal and recombining bacteria: Mycobacterium tuberculosis, Staphylococcus aureus, Escherichia coli and Klebsiella pneumoniae. Strong selection, recombination and penetrance confer high power to recover known antimicrobial resistance mechanisms and reveal a candidate association between the outer membrane porin nmpC and cefazolin resistance in E. coli. Hence, our method pinpoints locus-specific effects where possible and boosts power by detecting lineage-level differences when fine-mapping is intractable.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Controlling for population structure in bacterial GWASs for fusidic acid resistance in S. aureus.
Figure 2: Power, false positives, fine mapping and homoplasy in S. aureus. Simulation results.

Similar content being viewed by others

References

  1. Feil, E. J. & Spratt, B. G. Recombination and the structures of bacterial pathogens. Annu. Rev. Microbiol. 55, 561–590 (2001).

    Article  CAS  Google Scholar 

  2. Falush, D. & Bowden, R. Genome-wide association mapping in bacteria? Trends Microbiol. 14, 353–355 (2006).

    Article  CAS  Google Scholar 

  3. Stephens, M. & Balding, D. J. Bayesian statistical methods for genetic association studies. Nature Rev. Genet. 10, 681–690 (2009).

    Article  CAS  Google Scholar 

  4. Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).

    Article  CAS  Google Scholar 

  5. Cordero, O. X. & Polz, M. F. Explaining microbial genomic diversity in light of evolutionary ecology. Nature Rev. Microbiol. 12, 263–273 (2014).

    Article  CAS  Google Scholar 

  6. Whitman, W. B., Coleman, D. C. & Wiebe, W. J. Prokaryotes: the unseen majority. Proc. Natl Acad. Sci. USA 95, 6578–6583 (1998).

    Article  CAS  Google Scholar 

  7. Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive Earth's biogeochemical cycles. Science 320, 1034–1039 (2008).

    Article  CAS  Google Scholar 

  8. World Health Organization. The Global Burden of Disease: 2004 Update (2008); http://www.who.int/healthinfo/global_burden_disease

  9. Davies, J. & Davies, D. Origins and evolution of antibiotic resistance. Microbiol. Mol. Biol. Rev. 74, 417–433 (2010).

    Article  CAS  Google Scholar 

  10. European Centre for Disease Prevention and Control. Surveillance of Surgical-Site Infections in Europe, 2008–2009 (2012); http://www.ecdc.europa.eu/en/publications/Publications/120215_SUR_SSI_2008-2009.pdf

  11. World Health Organization. Global Tuberculosis Report 2014 (2014); http://apps.who.int/iris/bitstream/10665/137094/1/9789241564809_eng.pdf

  12. World Health Organization. Antimicrobial Resistance: A Global Report on Surveillance (2014); http://www.who.int/iris/bitstream/10665/112642/1/9789241564748_eng.pdf

  13. Sheppard, S. K. et al. Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc. Natl Acad. Sci. USA 110, 11923–11927 (2013).

    Article  CAS  Google Scholar 

  14. Alam, M. T. et al. Dissecting vancomycin-intermediate resistance in Staphylococcus aureus using genome-wide association. Genome Biol. Evol. 6, 1174–1185 (2014).

    Article  CAS  Google Scholar 

  15. Laabei, M. et al. Predicting the virulence of MRSA from its genome sequence. Genome Res. 24, 839–849 (2014).

    Article  CAS  Google Scholar 

  16. Chewapreecha, C. et al. Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes. PLoS Genet. 10, e1004547 (2014).

    Article  Google Scholar 

  17. Salipante, S. J. et al. Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains. Genome Res. 25, 119–128 (2014).

    Article  Google Scholar 

  18. Read, T. D. & Massey, R. C. Characterizing the genetic basis of bacterial phenotypes using genome-wide association studies: a new direction for bacteriology. Genome Med. 6, 109 (2014).

    Article  Google Scholar 

  19. Fahrat, M. R., Shapiro, B. J., Sheppard, S. K., Colijn, C. & Murray, M. A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens. Genome Med. 6, 101 (2014).

    Article  Google Scholar 

  20. Hall, B. G. SNP-associations and phenotype predictions from hundreds of microbial genomes without genome alignments. PLoS ONE 9, e90490 (2014).

    Article  Google Scholar 

  21. Chen, P. E. & Shapiro, B. J. The advent of genome-wide association studies for bacteria. Curr. Opin. Microbiol. 25, 17–24 (2015).

    Article  CAS  Google Scholar 

  22. Holt, K. E. et al. Genomic analysis of diversity, population structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health. Proc. Natl Acad. Sci. USA 112, E3574–E3581 (2015).

    Article  CAS  Google Scholar 

  23. Price, A. L., Zaitlen, N. A., Reich, D. & Patterson, N. New approaches to population stratification in genome-wide association studies. Nature Rev. Genet. 11, 459–463 (2010).

    Article  CAS  Google Scholar 

  24. Perez-Losada, M. et al. Population genetics of microbial pathogens estimated from multilocus sequence typing (MLST) data. Infect. Genet. Evol. 6, 97–112 (2006).

    Article  CAS  Google Scholar 

  25. Vos, M. & Didelot, X. A comparison of homologous recombination rates in bacteria and archeae. IMSE J. 3, 199–208 (2009).

    CAS  Google Scholar 

  26. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006).

    Article  CAS  Google Scholar 

  27. O'Neill, A. J., McLaws, F., Kahlmeter, G., Henriksen, A. S. & Chopra, I. Genetic basis of resistance to fusidic acid in staphylococci. Antimicrob. Agents Chemother. 51, 1737–1740 (2007).

    Article  CAS  Google Scholar 

  28. Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nature Genet. 44, 821–824 (2012).

    Article  CAS  Google Scholar 

  29. Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nature Genet. 46, 100–106 (2014).

    Article  Google Scholar 

  30. Grafen, A. The phylogenetic regression. Phil. Trans. R. Soc. Lond. B 326, 119–157 (1989).

    Article  CAS  Google Scholar 

  31. Martins, E. P. & Hansen, T. F. Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. Am. Nat. 149, 646–667 (1997).

    Article  Google Scholar 

  32. Milkman, R. & Bridges, M. M. Molecular evolution of the Escherichia coli chromosome. III. Clonal frames. Genetics 126, 505–517 (1990).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. McVean, G. A genealogical interpretation of principal components analysis. PLoS Genet. 5, e1000686 (2009).

    Article  Google Scholar 

  34. Astle, W. & Balding, D. J. Population structure and cryptic relatedness in genetic association studies. Stat. Sci. 24, 451–471 (2009).

    Article  Google Scholar 

  35. Wald, A. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Am. Math. Soc. 54, 426–482 (1943).

    Article  Google Scholar 

  36. Walker, T. M. et al. Whole genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect. Dis. 15, 1193–1202 (2015).

    Article  CAS  Google Scholar 

  37. Gordon, N. C. et al. Prediction of Staphylococcus aureus antimicrobial resistance by whole-genome sequencing. J. Clin. Microbiol. 52, 1182–1191 (2014).

    Article  CAS  Google Scholar 

  38. Stoesser, N. et al. Predicting antimicrobial susceptibilities for Escherichia coli and Klebsiella pneumoniae isolates using whole genome sequence data. J. Antimicrob. Chemother. 68, 2234–2244 (2013).

    Article  CAS  Google Scholar 

  39. Bradley, P. et al. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nature Commun. 6, 10063 (2015).

    Article  CAS  Google Scholar 

  40. Sun, S., Berg, O. G., Roth, J. R. & Andersson, D. I. Contribution of gene amplification to evolution of increased antibiotic resistance in Salmonella typhimurium. Genetics 182, 1183–1195 (2009).

    Article  CAS  Google Scholar 

  41. Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genet. 38, 203–208 (2006).

    Article  CAS  Google Scholar 

  42. Kang, H. M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).

    Article  Google Scholar 

  43. Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nature Genet. 42, 348–354 (2010).

    Article  CAS  Google Scholar 

  44. Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nature Methods 8, 833–835 (2011).

    Article  CAS  Google Scholar 

  45. Listgarten, J. et al. Improved linear mixed models for genome-wide association studies. Nature Methods 9, 525–526 (2012).

    Article  CAS  Google Scholar 

  46. O'Hagan, A. & Forster, J. in Kendall's Advanced Theory of Statistics Volume 2B Bayesian Inference 2nd edn, Ch. 11 (Wiley-Blackwell, 2010).

    Google Scholar 

  47. Eyre, D. W. et al. A pilot study of rapid benchtop sequencing of Staphylococcus aureus and Clostridium difficile for outbreak detection and surveillance. BMJ Open 2, e001124 (2012).

    Article  Google Scholar 

  48. Everitt, R. G. et al. Mobile elements drive recombination hotspots in the core genome of Staphylococcus aureus. Nature Commun. 5, 3956 (2014).

    Article  CAS  Google Scholar 

  49. Lunter, G. & Goodson, M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 21, 936–939 (2011).

    Article  CAS  Google Scholar 

  50. Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

    Article  CAS  Google Scholar 

  51. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).

    Article  Google Scholar 

  52. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

    Article  CAS  Google Scholar 

  53. Rizk, G., Lavenier, D. & Chikhi, R. DSK: k-mer counting with very low memory usage. Bioinformatics 29, 652–653 (2013).

    Article  CAS  Google Scholar 

  54. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  Google Scholar 

  55. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

    Article  CAS  Google Scholar 

  56. Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing data inference for whole genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).

    Article  CAS  Google Scholar 

  57. Didelot, X. & Wilson, D. J. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput. Biol. 11, e1004041 (2015).

    Article  Google Scholar 

  58. Hedge, J. & Wilson, D. J. Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not. mBio 5, e02158–14 (2014).

    Article  Google Scholar 

  59. Pupko, T., Pe'er, I., Shamir, R. & Graur, D. A fast algorithm for joint reconstruction of ancestral amino acid sequences. Mol. Biol. Evol. 17, 890–896 (2000).

    Article  CAS  Google Scholar 

  60. Yahara, K., Didelot, X., Ansari, M., Sheppard, S. K. & Falush, D. Efficient inference of recombination hot regions in bacterial genomes. Mol. Biol. Evol. 31, 1593–1605 (2014).

    Article  CAS  Google Scholar 

  61. Dunn, O. J. Estimation of the medians for dependent variables. Ann. Math. Stat. 30, 192–197 (1959).

    Article  Google Scholar 

  62. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 431 (2009).

    Article  Google Scholar 

  63. Langmead, B. & Salzberg, S. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012).

    Article  CAS  Google Scholar 

  64. UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank J.-B. Veyrieras, D. Charlesworth and B. Charlesworth for comments on the manuscript, X. Zhou and M. Stephens for helping adapt their software, S. Niemann for assisting with tuberculosis isolates and X. Didelot, D. Falush, R. Bowden, S. Myers, J. Marchini, J. Pickrell, P. Visscher, A. Price and P. Donnelly for discussions. This study was supported by the Oxford NIHR Biomedical Research Centre, a Mérieux Research Grant and the UKCRC Modernising Medical Microbiology Consortium, the latter funded under the UKCRC Translational Infection Research Initiative supported by the Medical Research Council, the Biotechnology and Biological Sciences Research Council and the National Institute for Health Research on behalf of the UK Department of Health (grant no. G0800778) and the Wellcome Trust (grant no. 087646/Z/08/Z). T.M.W. is an MRC research training fellow. C.C.A.S. was supported by a Wellcome Trust Career Development Fellowship (grant no. 097364/Z/11/Z). D.A.C. is funded by the Royal Academy of Engineering and an EPSRC Healthcare Technologies Challenge Award. T.E.P. and D.W.C. are NIHR Senior Investigators. G.M. is supported by a Wellcome Trust Investigator Award (grant no. 100956/Z/13/Z). D.J.W. and Z.I. are Sir Henry Dale Fellows, jointly funded by the Wellcome Trust and the Royal Society (grants nos. 101237/Z/13/Z and 102541/Z/13/Z).

Author information

Authors and Affiliations

Authors

Contributions

S.G.E., C.-H.W., J.C. and D.J.W. designed the study, developed the methods, performed the analysis, interpreted the results and wrote the manuscript. Z.I. and D.A.C. assisted the analysis and commented on the manuscript. N.S., N.C.G., T.M.W., K.L.H., N.W., E.G.S., N.I., M.J.L., T.E.P. and D.W.C. designed and implemented isolate collection, drug susceptibility testing and whole-genome sequencing, and assisted with interpretation. C.C.A.S., G.M. and A.S.W. assisted with methods development and writing of the manuscript.

Corresponding author

Correspondence to Daniel J. Wilson.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary information

Supplementary Figures 1-10 and Tables 1-5 (PDF 18390 kb)

Supplementary Data 1

Individual BioSample accession numbers and antimicrobial resistance phenotypes. (XLSX 260 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Earle, S., Wu, CH., Charlesworth, J. et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat Microbiol 1, 16041 (2016). https://doi.org/10.1038/nmicrobiol.2016.41

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/nmicrobiol.2016.41

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research