Letter

Identifying lineage effects when controlling for population structure improves power in bacterial association studies

  • Nature Microbiology 1, Article number: 16041 (2016)
  • doi:10.1038/nmicrobiol.2016.41
  • Download Citation
Received:
Accepted:
Published online:

Abstract

Bacteria pose unique challenges for genome-wide association studies because of strong structuring into distinct strains and substantial linkage disequilibrium across the genome1,2. Although methods developed for human studies can correct for strain structure3,4, this risks considerable loss-of-power because genetic differences between strains often contribute substantial phenotypic variability5. Here, we propose a new method that captures lineage-level associations even when locus-specific associations cannot be fine-mapped. We demonstrate its ability to detect genes and genetic variants underlying resistance to 17 antimicrobials in 3,144 isolates from four taxonomically diverse clonal and recombining bacteria: Mycobacterium tuberculosis, Staphylococcus aureus, Escherichia coli and Klebsiella pneumoniae. Strong selection, recombination and penetrance confer high power to recover known antimicrobial resistance mechanisms and reveal a candidate association between the outer membrane porin nmpC and cefazolin resistance in E. coli. Hence, our method pinpoints locus-specific effects where possible and boosts power by detecting lineage-level differences when fine-mapping is intractable.

  • Subscribe to Nature Microbiology for full access:

    $59

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    & Recombination and the structures of bacterial pathogens. Annu. Rev. Microbiol. 55, 561–590 (2001).

  2. 2.

    & Genome-wide association mapping in bacteria? Trends Microbiol. 14, 353–355 (2006).

  3. 3.

    & Bayesian statistical methods for genetic association studies. Nature Rev. Genet. 10, 681–690 (2009).

  4. 4.

    , , & Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).

  5. 5.

    & Explaining microbial genomic diversity in light of evolutionary ecology. Nature Rev. Microbiol. 12, 263–273 (2014).

  6. 6.

    , & Prokaryotes: the unseen majority. Proc. Natl Acad. Sci. USA 95, 6578–6583 (1998).

  7. 7.

    , & The microbial engines that drive Earth's biogeochemical cycles. Science 320, 1034–1039 (2008).

  8. 8.

    World Health Organization. The Global Burden of Disease: 2004 Update (2008);

  9. 9.

    & Origins and evolution of antibiotic resistance. Microbiol. Mol. Biol. Rev. 74, 417–433 (2010).

  10. 10.

    European Centre for Disease Prevention and Control. Surveillance of Surgical-Site Infections in Europe, 2008–2009 (2012);

  11. 11.

    World Health Organization. Global Tuberculosis Report 2014 (2014);

  12. 12.

    World Health Organization. Antimicrobial Resistance: A Global Report on Surveillance (2014);

  13. 13.

    et al. Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc. Natl Acad. Sci. USA 110, 11923–11927 (2013).

  14. 14.

    et al. Dissecting vancomycin-intermediate resistance in Staphylococcus aureus using genome-wide association. Genome Biol. Evol. 6, 1174–1185 (2014).

  15. 15.

    et al. Predicting the virulence of MRSA from its genome sequence. Genome Res. 24, 839–849 (2014).

  16. 16.

    et al. Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes. PLoS Genet. 10, e1004547 (2014).

  17. 17.

    et al. Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains. Genome Res. 25, 119–128 (2014).

  18. 18.

    & Characterizing the genetic basis of bacterial phenotypes using genome-wide association studies: a new direction for bacteriology. Genome Med. 6, 109 (2014).

  19. 19.

    , , , & A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens. Genome Med. 6, 101 (2014).

  20. 20.

    SNP-associations and phenotype predictions from hundreds of microbial genomes without genome alignments. PLoS ONE 9, e90490 (2014).

  21. 21.

    & The advent of genome-wide association studies for bacteria. Curr. Opin. Microbiol. 25, 17–24 (2015).

  22. 22.

    et al. Genomic analysis of diversity, population structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health. Proc. Natl Acad. Sci. USA 112, E3574–E3581 (2015).

  23. 23.

    , , & New approaches to population stratification in genome-wide association studies. Nature Rev. Genet. 11, 459–463 (2010).

  24. 24.

    et al. Population genetics of microbial pathogens estimated from multilocus sequence typing (MLST) data. Infect. Genet. Evol. 6, 97–112 (2006).

  25. 25.

    & A comparison of homologous recombination rates in bacteria and archeae. IMSE J. 3, 199–208 (2009).

  26. 26.

    et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006).

  27. 27.

    , , , & Genetic basis of resistance to fusidic acid in staphylococci. Antimicrob. Agents Chemother. 51, 1737–1740 (2007).

  28. 28.

    & Genome-wide efficient mixed-model analysis for association studies. Nature Genet. 44, 821–824 (2012).

  29. 29.

    , , , & Advantages and pitfalls in the application of mixed-model association methods. Nature Genet. 46, 100–106 (2014).

  30. 30.

    The phylogenetic regression. Phil. Trans. R. Soc. Lond. B 326, 119–157 (1989).

  31. 31.

    & Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. Am. Nat. 149, 646–667 (1997).

  32. 32.

    & Molecular evolution of the Escherichia coli chromosome. III. Clonal frames. Genetics 126, 505–517 (1990).

  33. 33.

    A genealogical interpretation of principal components analysis. PLoS Genet. 5, e1000686 (2009).

  34. 34.

    & Population structure and cryptic relatedness in genetic association studies. Stat. Sci. 24, 451–471 (2009).

  35. 35.

    Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Am. Math. Soc. 54, 426–482 (1943).

  36. 36.

    et al. Whole genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect. Dis. 15, 1193–1202 (2015).

  37. 37.

    et al. Prediction of Staphylococcus aureus antimicrobial resistance by whole-genome sequencing. J. Clin. Microbiol. 52, 1182–1191 (2014).

  38. 38.

    et al. Predicting antimicrobial susceptibilities for Escherichia coli and Klebsiella pneumoniae isolates using whole genome sequence data. J. Antimicrob. Chemother. 68, 2234–2244 (2013).

  39. 39.

    et al. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nature Commun. 6, 10063 (2015).

  40. 40.

    , , & Contribution of gene amplification to evolution of increased antibiotic resistance in Salmonella typhimurium. Genetics 182, 1183–1195 (2009).

  41. 41.

    et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genet. 38, 203–208 (2006).

  42. 42.

    et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).

  43. 43.

    et al. Variance component model to account for sample structure in genome-wide association studies. Nature Genet. 42, 348–354 (2010).

  44. 44.

    et al. FaST linear mixed models for genome-wide association studies. Nature Methods 8, 833–835 (2011).

  45. 45.

    et al. Improved linear mixed models for genome-wide association studies. Nature Methods 9, 525–526 (2012).

  46. 46.

    & in Kendall's Advanced Theory of Statistics Volume 2B Bayesian Inference 2nd edn, Ch. 11 (Wiley-Blackwell, 2010).

  47. 47.

    et al. A pilot study of rapid benchtop sequencing of Staphylococcus aureus and Clostridium difficile for outbreak detection and surveillance. BMJ Open 2, e001124 (2012).

  48. 48.

    et al. Mobile elements drive recombination hotspots in the core genome of Staphylococcus aureus. Nature Commun. 5, 3956 (2014).

  49. 49.

    & Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 21, 936–939 (2011).

  50. 50.

    & Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

  51. 51.

    et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).

  52. 52.

    & Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

  53. 53.

    , & DSK: k-mer counting with very low memory usage. Bioinformatics 29, 652–653 (2013).

  54. 54.

    , & Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

  55. 55.

    RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

  56. 56.

    & Rapid and accurate haplotype phasing and missing data inference for whole genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).

  57. 57.

    & ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput. Biol. 11, e1004041 (2015).

  58. 58.

    & Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not. mBio 5, e02158–14 (2014).

  59. 59.

    , , & A fast algorithm for joint reconstruction of ancestral amino acid sequences. Mol. Biol. Evol. 17, 890–896 (2000).

  60. 60.

    , , , & Efficient inference of recombination hot regions in bacterial genomes. Mol. Biol. Evol. 31, 1593–1605 (2014).

  61. 61.

    Estimation of the medians for dependent variables. Ann. Math. Stat. 30, 192–197 (1959).

  62. 62.

    et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 431 (2009).

  63. 63.

    & Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012).

  64. 64.

    UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).

Download references

Acknowledgements

The authors thank J.-B. Veyrieras, D. Charlesworth and B. Charlesworth for comments on the manuscript, X. Zhou and M. Stephens for helping adapt their software, S. Niemann for assisting with tuberculosis isolates and X. Didelot, D. Falush, R. Bowden, S. Myers, J. Marchini, J. Pickrell, P. Visscher, A. Price and P. Donnelly for discussions. This study was supported by the Oxford NIHR Biomedical Research Centre, a Mérieux Research Grant and the UKCRC Modernising Medical Microbiology Consortium, the latter funded under the UKCRC Translational Infection Research Initiative supported by the Medical Research Council, the Biotechnology and Biological Sciences Research Council and the National Institute for Health Research on behalf of the UK Department of Health (grant no. G0800778) and the Wellcome Trust (grant no. 087646/Z/08/Z). T.M.W. is an MRC research training fellow. C.C.A.S. was supported by a Wellcome Trust Career Development Fellowship (grant no. 097364/Z/11/Z). D.A.C. is funded by the Royal Academy of Engineering and an EPSRC Healthcare Technologies Challenge Award. T.E.P. and D.W.C. are NIHR Senior Investigators. G.M. is supported by a Wellcome Trust Investigator Award (grant no. 100956/Z/13/Z). D.J.W. and Z.I. are Sir Henry Dale Fellows, jointly funded by the Wellcome Trust and the Royal Society (grants nos. 101237/Z/13/Z and 102541/Z/13/Z).

Author information

Author notes

    • Sarah G. Earle
    • , Chieh-Hsi Wu
    •  & Jane Charlesworth

    These authors contributed equally to this work.

Affiliations

  1. Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK

    • Sarah G. Earle
    • , Chieh-Hsi Wu
    • , Jane Charlesworth
    • , Nicole Stoesser
    • , N. Claire Gordon
    • , Timothy M. Walker
    • , Tim E. Peto
    • , Derrick W. Crook
    • , A. Sarah Walker
    •  & Daniel J. Wilson
  2. Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK

    • Chris C. A. Spencer
    • , Zamin Iqbal
    • , Gil McVean
    •  & Daniel J. Wilson
  3. Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford OX3 7DQ, UK

    • David A. Clifton
  4. Antimicrobial Resistance and Healthcare Associated Infections Reference Unit, Public Health England, London NW9 5EQ, UK

    • Katie L. Hopkins
    •  & Neil Woodford
  5. Public Health England, West Midlands Public Health Laboratory, Heartlands Hospital, Birmingham B9 5SS, UK

    • E. Grace Smith
  6. Centre for Tuberculosis, National Institute for Communicable Diseases, Johannesburg 2131 South Africa

    • Nazir Ismail
  7. Department of Medical Microbiology, University of Pretoria, Pretoria, South Africa

    • Nazir Ismail
  8. Department of Infectious Diseases and Microbiology, Royal Sussex County Hospital, Brighton BN2 5BE, UK

    • Martin J. Llewelyn
  9. The Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7FZ, UK

    • Gil McVean

Authors

  1. Search for Sarah G. Earle in:

  2. Search for Chieh-Hsi Wu in:

  3. Search for Jane Charlesworth in:

  4. Search for Nicole Stoesser in:

  5. Search for N. Claire Gordon in:

  6. Search for Timothy M. Walker in:

  7. Search for Chris C. A. Spencer in:

  8. Search for Zamin Iqbal in:

  9. Search for David A. Clifton in:

  10. Search for Katie L. Hopkins in:

  11. Search for Neil Woodford in:

  12. Search for E. Grace Smith in:

  13. Search for Nazir Ismail in:

  14. Search for Martin J. Llewelyn in:

  15. Search for Tim E. Peto in:

  16. Search for Derrick W. Crook in:

  17. Search for Gil McVean in:

  18. Search for A. Sarah Walker in:

  19. Search for Daniel J. Wilson in:

Contributions

S.G.E., C.-H.W., J.C. and D.J.W. designed the study, developed the methods, performed the analysis, interpreted the results and wrote the manuscript. Z.I. and D.A.C. assisted the analysis and commented on the manuscript. N.S., N.C.G., T.M.W., K.L.H., N.W., E.G.S., N.I., M.J.L., T.E.P. and D.W.C. designed and implemented isolate collection, drug susceptibility testing and whole-genome sequencing, and assisted with interpretation. C.C.A.S., G.M. and A.S.W. assisted with methods development and writing of the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Daniel J. Wilson.

Supplementary information

PDF files

  1. 1.

    Supplementary information

    Supplementary Figures 1-10 and Tables 1-5

Excel files

  1. 1.

    Supplementary Data 1

    Individual BioSample accession numbers and antimicrobial resistance phenotypes.