Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Multinucleotide mutations cause false inferences of lineage-specific positive selection

Abstract

Phylogenetic tests of adaptive evolution, such as the widely used branch-site test (BST), assume that nucleotide substitutions occur singly and independently. Recent research has shown that errors at adjacent sites often occur during DNA replication, and the resulting multinucleotide mutations (MNMs) are overwhelmingly likely to be non-synonymous. To evaluate whether the BST misinterprets sequence patterns produced by MNMs as false support for positive selection, we analysed two genome-scale datasets—one from mammals and one from flies. We found that codons with multiple differences account for virtually all the support for lineage-specific positive selection in the BST. Simulations under conditions derived from these alignments but without positive selection show that realistic rates of MNMs cause a strong and systematic bias towards false inferences of selection. This bias is sufficient under empirically derived conditions to produce false positive inferences as often as the BST infers positive selection from the empirical data. Although some genes with BST-positive results may have evolved adaptively, the test cannot distinguish sequence patterns produced by authentic positive selection from those caused by neutral fixation of MNMs. Many published inferences of adaptive evolution using this technique may therefore be artefacts of model violation caused by unincorporated neutral mutational processes. We introduce a model that incorporates MNMs and may help to ameliorate this bias.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: CMDs drive branch-site signatures of selection.
Fig. 2: Incorporating MNMs into the branch-site model eliminates the signature of positive selection in many genes.
Fig. 3: MNMs cause a strong bias in the BST under realistic conditions.
Fig. 4: Transversion enrichment in CMDs biases the BST.
Fig. 5: CMDs implying multiple non-synonymous steps drive the BST.

Similar content being viewed by others

References

  1. Goldman, N. & Yang, Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11, 725–736 (1994).

    CAS  PubMed  Google Scholar 

  2. Murrell, B. et al. Gene-wide identification of episodic selection. Mol. Biol. Evol. 32, 1365–1371 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Murrell, B. et al. Detecting individual sites subject to episodic diversifying selection. PLoS Genet. 8, e1002764 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Smith, M. D. et al. Less is more: an adaptive branch-site random effects model for efficient detection of episodic diversifying selection. Mol. Biol. Evol. 32, 1342–1353 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Yang, Z. & Nielsen, R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19, 908–917 (2002).

    Article  CAS  PubMed  Google Scholar 

  6. Zhang, J., Nielsen, R. & Yang, Z. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol. Biol. Evol. 22, 2472–2479 (2005).

    Article  CAS  PubMed  Google Scholar 

  7. Pond, S. L., Frost, S. D. & Muse, S. V. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21, 676–679 (2005).

    Article  CAS  PubMed  Google Scholar 

  8. Kosiol, C., Holmes, I. & Goldman, N. An empirical codon model for protein sequence evolution. Mol. Biol. Evol. 24, 1464–1479 (2007).

    Article  CAS  PubMed  Google Scholar 

  9. Whelan, S. & Goldman, N. Estimating the frequency of events that cause multiple-nucleotide changes. Genetics 167, 2027–2043 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Muse, S. V. & Gaut, B. S. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol. Biol. Evol. 11, 715–724 (1994).

    CAS  PubMed  Google Scholar 

  11. Han, M. V., Demuth, J. P., McGrath, C. L., Casola, C. & Hahn, M. W. Adaptive evolution of young gene duplicates in mammals. Genome Res. 19, 859–867 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Drosophila 12 Genomes Consortium et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218 (2007).

  13. Foote, A. D. et al. Convergent evolution of the genomes of marine mammals. Nat. Genet. 47, 272–275 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kosiol, C. et al. Patterns of positive selection in six mammalian genomes. PLoS Genet. 4, e1000144 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Roux, J. et al. Patterns of positive selection in seven ant genomes. Mol. Biol. Evol. 31, 1661–1685 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Yang, Z. & dos Reis, M. Statistical properties of the branch-site test of positive selection. Mol. Biol. Evol. 28, 1217–1228 (2011).

    Article  CAS  PubMed  Google Scholar 

  17. Zhang, J. Performance of likelihood ratio tests of evolutionary hypotheses under inadequate substitution models. Mol. Biol. Evol. 16, 868–875 (1999).

    Article  CAS  PubMed  Google Scholar 

  18. Gharib, W. H. & Robinson-Rechavi, M. The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC. Mol. Biol. Evol. 30, 1675–1686 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Zhai, W., Nielsen, R., Goldman, N. & Yang, Z. Looking for Darwin in genomic sequences—validity and success of statistical methods. Mol. Biol. Evol. 29, 2889–2893 (2012).

    Article  CAS  PubMed  Google Scholar 

  20. Nozawa, M., Suzuki, Y. & Nei, M. Reliabilities of identifying positive selection by the branch-site and the site-prediction methods. Proc. Natl Acad. Sci. USA 106, 6700–6705 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Casola, C. & Hahn, M. W. Gene conversion among paralogs results in moderate false detection of positive selection using likelihood methods. J. Mol. Evol. 68, 679–687 (2009).

    Article  CAS  PubMed  Google Scholar 

  22. Anisimova, M. & Yang, Z. Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol. Biol. Evol. 24, 1219–1228 (2007).

    Article  CAS  PubMed  Google Scholar 

  23. Kosakovsky Pond, S. L. et al. A random effects branch-site model for detecting episodic diversifying selection. Mol. Biol. Evol. 28, 3033–3043 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Zhang, J. Frequent false detection of positive selection by the likelihood method with branch-site models. Mol. Biol. Evol. 21, 1332–1339 (2004).

    Article  CAS  PubMed  Google Scholar 

  25. Schrider, D. R., Hourmozdi, J. N. & Hahn, M. W. Pervasive multinucleotide mutational events in eukaryotes. Curr. Biol. 21, 1051–1054 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Saribasak, H. et al. DNA polymerase ζ generates tandem mutations in immunoglobulin variable regions. J. Exp. Med. 209, 1075–1081 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Loeb, L. A. & Monnat, R. J. DNA polymerases and human disease. Nat. Rev. Genet. 9, 594–604 (2008).

    Article  CAS  PubMed  Google Scholar 

  28. Matsuda, T., Bebenek, K., Masutani, C., Hanaoka, F. & Kunkel, T. A. Low fidelity DNA synthesis by human DNA polymerase-η. Nature 404, 1011–1013 (2000).

    Article  CAS  PubMed  Google Scholar 

  29. Seplyarskiy, V. B., Bazykin, G. A. & Soldatov, R. A. Polymerase ζ activity is linked to replication timing in humans: evidence from mutational signatures. Mol. Biol. Evol. 32, 3158–3172 (2015).

    CAS  PubMed  Google Scholar 

  30. Stone, J. E., Lujan, S. A., Kunkel, T. A. & Kunkel, T. A. DNA polymerase zeta generates clustered mutations during bypass of endogenous DNA lesions in Saccharomyces cerevisiae. Environ. Mol. Mutagen. 53, 777–786 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Arana, M. E., Seki, M., Wood, R. D., Rogozin, I. B. & Kunkel, T. A. Low-fidelity DNA synthesis by human DNA polymerase theta. Nucleic Acids Res. 36, 3847–3856 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Besenbacher, S. et al. Multi-nucleotide de novo mutations in humans. PLoS Genet. 12, e1006315 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. Chen, J. M., Férec, C. & Cooper, D. N. Complex multiple-nucleotide substitution mutations causing human inherited disease reveal novel insights into the action of translesion synthesis DNA polymerases. Hum. Mutat. 36, 1034–1038 (2015).

    Article  CAS  PubMed  Google Scholar 

  34. Chen, J. M., Cooper, D. N. & Férec, C. A new and more accurate estimate of the rate of concurrent tandem-base substitution mutations in the human germline: 0.4% of the single-nucleotide substitution mutation rate. Hum. Mutat. 35, 392–394 (2014).

    Article  CAS  PubMed  Google Scholar 

  35. Harris, K. & Nielsen, R. Error-prone polymerase activity causes multinucleotide mutations in humans. Genome Res. 24, 1445–1454 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Hodgkinson, A. & Eyre-Walker, A. Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 12, 756–766 (2011).

    Article  CAS  PubMed  Google Scholar 

  37. Assaf, Z. J., Tilk, S., Park, J., Siegal, M. L. & Petrov, D. A. Deep sequencing of natural and experimental populations of Drosophila melanogaster reveals biases in the spectrum of new mutations. Genome Res. 27, 1988–2000 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Francioli, L. C. et al. Genome-wide patterns and properties of de novo mutations in humans. Nat. Genet. 47, 822–826 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Zhu, W. et al. Concurrent nucleotide substitution mutations in the human genome are characterized by a significantly decreased transition/transversion ratio. Hum. Mutat. 36, 333–341 (2015).

    Article  CAS  PubMed  Google Scholar 

  40. Averof, M., Rokas, A., Wolfe, K. H. & Sharp, P. M. Evidence for a high frequency of simultaneous double-nucleotide substitutions. Science 287, 1283–1286 (2000).

    Article  CAS  PubMed  Google Scholar 

  41. Bazykin, G. A., Kondrashov, F. A., Ogurtsov, A. Y., Sunyaev, S. & Kondrashov, A. S. Positive selection at sites of multiple amino acid replacements since rat–mouse divergence. Nature 429, 558–562 (2004).

    Article  CAS  PubMed  Google Scholar 

  42. Rogozin, I. B. et al. Evolutionary switches between two serine codon sets are driven by selection. Proc. Natl Acad. Sci. USA 113, 13109–13113 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. De Maio, N., Holmes, I., Schlötterer, C. & Kosiol, C. Estimating empirical codon hidden Markov models. Mol. Biol. Evol. 30, 725–736 (2013).

    Article  CAS  PubMed  Google Scholar 

  44. Suzuki, Y. False-positive results obtained from the branch-site test of positive selection. Genes Genet. Syst. 83, 331–338 (2008).

    Article  PubMed  Google Scholar 

  45. Larracuente, A. M. et al. Evolution of protein-coding genes in Drosophila. Trends Genet. 24, 114–123 (2008).

    Article  CAS  PubMed  Google Scholar 

  46. Sironi, M., Cagliani, R., Forni, D. & Clerici, M. Evolutionary insights into host–pathogen interactions from mammalian sequence data. Nat. Rev. Genet. 16, 224–236 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Elde, N. C., Child, S. J., Geballe, A. P. & Malik, H. S. Protein kinase R reveals an evolutionary model for defeating viral mimicry. Nature 457, 485–489 (2009).

    Article  CAS  PubMed  Google Scholar 

  48. Patel, M. R., Loo, Y. M., Horner, S. M., Gale, M. & Malik, H. S. Convergent evolution of escape from hepaciviral antagonism in primates. PLoS Biol. 10, e1001282 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Demogines, A., Abraham, J., Choe, H., Farzan, M. & Sawyer, S. L. Dual host–virus arms races shape an essential housekeeping protein. PLoS Biol. 11, e1001571 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Barber, M. F. & Elde, N. C. Nutritional immunity. Escape from bacterial iron piracy through rapid evolution of transferrin. Science 346, 1362–1366 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Machkovech, H. M., Bedford, T., Suchard, M. A. & Bloom, J. D. Positive selection in CD8+ T-cell epitopes of influenza virus nucleoprotein revealed by a comparative analysis of human and swine viral lineages. J. Virol. 89, 11275–11283 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Field, S. F., Bulina, M. Y., Kelmanson, I. V., Bielawski, J. P. & Matz, M. V. Adaptive evolution of multicolored fluorescent proteins in reef-building corals. J. Mol. Evol. 62, 332–339 (2006).

    Article  CAS  PubMed  Google Scholar 

  53. Yokoyama, S., Tada, T., Zhang, H. & Britt, L. Elucidation of phenotypic adaptations: molecular analyses of dim-light vision proteins in vertebrates. Proc. Natl Acad. Sci. USA 105, 13480–13485 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Zhuang, H., Chien, M. S. & Matsunami, H. Dynamic functional evolution of an odorant receptor for sex-steroid-derived odors in primates. Proc. Natl Acad. Sci. USA 106, 21247–21251 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Bloom, J. D. An experimentally determined evolutionary model dramatically improves phylogenetic fit. Mol. Biol. Evol. 31, 1956–1978 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Lopez, P., Casane, D. & Philippe, H. Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19, 1–7 (2002).

    Article  CAS  PubMed  Google Scholar 

  57. Pond, S. K. & Muse, S. V. Site-to-site variation of synonymous substitution rates. Mol. Biol. Evol. 22, 2375–2385 (2005).

    Article  CAS  PubMed  Google Scholar 

  58. Chan, Y. F. et al. Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science 327, 302–305 (2010).

    Article  CAS  PubMed  Google Scholar 

  59. Barrett, R. D. & Hoekstra, H. E. Molecular spandrels: tests of adaptation at the genetic level. Nat. Rev. Genet. 12, 767–780 (2011).

    Article  CAS  PubMed  Google Scholar 

  60. Siddiq, M. A., Loehlin, D. W., Montooth, K. L. & Thornton, J. W. Experimental test and refutation of a classic case of molecular adaptation in Drosophila melanogaster. Nat. Ecol. Evol. 1, 0025 (2017).

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to the members of the Thornton laboratory for discussion and helpful comments. We thank the Beagle2, Midway2 and Tarbell supercomputing clusters at the University of Chicago. We also thank the developers of HyPhy for presenting an open source platform that allows customization of standard analyses. Funding was provided by NIH R01GM104397 and R01GM121931 (to J.W.T.), NSF DEB-1601781 (to J.W.T. and A.V.), NSF DBI-1564611 (to M.W.H.), and the Precision Health Initiative of Indiana University (to M.W.H.).

Author information

Authors and Affiliations

Authors

Contributions

The analyses were designed by all authors, performed by A.V. and interpreted by all authors. The manuscript was written by A.V. and J.W.T. with contributions from M.W.H.

Corresponding author

Correspondence to Joseph W. Thornton.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Venkat, A., Hahn, M.W. & Thornton, J.W. Multinucleotide mutations cause false inferences of lineage-specific positive selection. Nat Ecol Evol 2, 1280–1288 (2018). https://doi.org/10.1038/s41559-018-0584-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41559-018-0584-5

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing