Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Frequency of word-use predicts rates of lexical evolution throughout Indo-European history


Greek speakers say “ουρ”, Germans “schwanz” and the French “queue” to describe what English speakers call a ‘tail’, but all of these languages use a related form of ‘two’ to describe the number after one. Among more than 100 Indo-European languages and dialects, the words for some meanings (such as ‘tail’) evolve rapidly, being expressed across languages by dozens of unrelated words, while others evolve much more slowly—such as the number ‘two’, for which all Indo-European language speakers use the same related word-form1. No general linguistic mechanism has been advanced to explain this striking variation in rates of lexical replacement among meanings. Here we use four large and divergent language corpora (English2, Spanish3, Russian4 and Greek5) and a comparative database of 200 fundamental vocabulary meanings in 87 Indo-European languages6 to show that the frequency with which these words are used in modern language predicts their rate of replacement over thousands of years of Indo-European language evolution. Across all 200 meanings, frequently used words evolve at slower rates and infrequently used words evolve more rapidly. This relationship holds separately and identically across parts of speech for each of the four language corpora, and accounts for approximately 50% of the variation in historical rates of lexical replacement. We propose that the frequency with which specific words are used in everyday language exerts a general and law-like influence on their rates of evolution. Our findings are consistent with social models of word change that emphasize the role of selection, and suggest that owing to the ways that humans use language, some words will evolve slowly and others rapidly across all languages.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Get just this article for as long as you need it


Prices may be subject to local taxes which are calculated during checkout

Figure 1: Frequency plots for rates of lexical evolution in Indo-European across 200 fundamental vocabulary meanings.
Figure 2: Distribution of frequency of meaning-use for 200 meanings in four Indo-European languages.
Figure 3: Frequency of meaning-use plotted against estimated rate of lexical evolution for 200 basic meanings in four Indo-European languages.


  1. Kruskal, J. B., Dyen, I. & Black, P. D. in Mathematics in the Archaeological and Historical Sciences (eds Hodson, F. R., Kendall, D. G. & Tautu, P.) 361–380 (Edinburgh Univ. Press, Edinburgh, UK, 1971)

    Google Scholar 

  2. Leech, G., Rayson, P. & Wilson, A. Word Frequencies in Written and Spoken English: based on the British National Corpus (Longman, London, 2001)

    Google Scholar 

  3. Davies, M. Corpus del Español. 〈〉 (2001–02)

  4. Sharoff, S. in Corpus Linguistics Around the World (eds Archer, D., Wilson, A. & Rayson, P.) 167–180 (Rodopi, Amsterdam, 2005)

    Google Scholar 

  5. Institute for Language and Speech Processing (ILSP) Corpus. Hellenic National Corpus (HNC) Web Version 3.0 [in Greek]. 〈〉 (1999–2006)

  6. Gray, R. D. & Atkinson, Q. D. Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426, 435–439 (2003)

    Article  ADS  CAS  Google Scholar 

  7. Gordon, R. G. Ethnologue: Languages of the World 15th edn (SIL International, Dallas, 2005)

    Google Scholar 

  8. Swadesh, M. Lexico-statistic dating of prehistoric ethnic contacts. Proc. Am. Phil. Soc. 96, 453–463 (1952)

    Google Scholar 

  9. Dyen, I., James, A. T. & Cole, J. W. L. Language divergence and estimated word retention rate. Language 43, 150–171 (1967)

    Article  Google Scholar 

  10. Pagel, M. & Meade, A. in Phylogenetic Methods and the Prehistory of Languages (eds Clackson, J., Forster, P. & Renfrew, C.) 173–182 (MacDonald Institute for Archaeological Research, Cambridge, UK, 2006)

    Google Scholar 

  11. Labov, W. Principles of Linguistic Change: Social Factors (Blackwell, Oxford, UK, 2001)

    Google Scholar 

  12. Milroy, J. & Milroy, L. Linguistic change, social network and speaker innovation. J. Linguist. 21, 229–284 (1985)

    Article  Google Scholar 

  13. Nettle, D. Is the rate of linguistic change constant? Lingua 108, 119–136 (1999)

    Article  Google Scholar 

  14. Thomason, S. G. & Kaufman, T. Language Contact, Creolization, and Genetic Linguistics (Univ. California Press, Berkeley, 1988)

    Google Scholar 

  15. Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, Cambridge, UK, 1983)

    Book  Google Scholar 

  16. Boyd, R. & Richerson, P. J. Culture and the Evolutionary Process (Univ. Chicago Press, Chicago, 1985)

    Google Scholar 

  17. Kirby, S. Function, Selection, and Innateness: the Emergence of Language Universals (Oxford Univ. Press, Oxford, UK, 1999)

    Google Scholar 

  18. Croft, W. Explaining Language Change: an Evolutionary Approach (Longman, Harlow, UK, 2000)

    Google Scholar 

  19. Gimbutas, M. The beginning of the Bronze Age in Europe and the Indo-Europeans 3500–2500 B.C. J. Indo-Eur. Stud. 1, 163–214 (1973)

    Google Scholar 

  20. Renfrew, C. Archaeology and Language: the Puzzle of Indo-European Origins (Cape, London, 1987)

    Google Scholar 

  21. Anderson, J. R. Acquisition of cognitive skill. Psychol. Rev. 89, 369–406 (1982)

    Article  Google Scholar 

  22. Ellis, N. C. Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Stud. Second Lang. Acquisit. 24, 143–188 (2002)

    Article  Google Scholar 

  23. Fontanari, J. F. & Perlovsky, L. I. Solvable null model for the distribution of word frequencies. Phys. Rev. E 70, 042901 (2004)

    Article  ADS  CAS  Google Scholar 

  24. Zipf, G. K. Prehistoric 'cultural strata' in the evolution of Germanic: The case of Gothic. Mod. Lang. Notes 62, 522–530 (1947)

    Article  Google Scholar 

  25. Francis, W. N., Kuçera, H. & Mackie, A. W. Frequency Analysis of English Usage: Lexicon and Grammar (Houghton Mifflin, Boston, 1982)

    Google Scholar 

  26. Burger, J., Kirchner, M., Bramanti, B., Haak, W. & Thomas, M. G. Absence of the lactase-persistence-associated allele in early Neolithic Europeans. Proc. Natl Acad. Sci. USA 104, 3736–3741 (2007)

    Article  ADS  CAS  Google Scholar 

  27. Mackay, W. & Kondrak, G. in Proceedings of the 9th Conf. on Computational Natural Language Learning (CoNLL) 40–47 (ACL, Schroudsburg, PA, 2005)

    Book  Google Scholar 

  28. Greenberg, J. H. Indo-European and its Closest Relatives: The Eurasiatic Language Family Vol. 1, Grammar (Stanford Univ. Press, Stanford, CA, 2000)

    Google Scholar 

  29. Kaiser, M. & Shevoroshkin, V. Nostratic. Annu. Rev. Anthropol. 17, 309–329 (1988)

    Article  Google Scholar 

  30. Pagel, M. & Meade, A. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53, 571–581 (2004)

    Article  Google Scholar 

  31. Dyen, I., Kruskal, J. B. & Black, P. An Indo-European classification, a lexicostatistical experiment. 1. Trans. Am. Phil. Soc. 82, 1–132 (1992)

    Article  Google Scholar 

  32. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equations of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1091 (1953)

    Article  ADS  CAS  Google Scholar 

  33. Pagel, M. & Meade, A. in Mathematics of Evolution and Phylogeny (ed. Gascuel, O.) 121–139 (Oxford Univ. Press, New York, 2005)

    Google Scholar 

  34. Pagel, M. & Meade, A. in The Evolution of Cultural Diversity: a Phylogenetic Approach (eds Mace, R., Holden, C. J. & Shennan, S.) 235–256 (UCL Press, London, 2005)

    Google Scholar 

  35. Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994)

    Article  ADS  CAS  Google Scholar 

  36. Steel, M. A., Hendy, M. D. & Penny, D. Loss of information in genetic distances. Nature 336, 118 (1988)

    Article  ADS  CAS  Google Scholar 

Download references


We thank R. Gray and S. Greenhill for comments and advice. This research was supported by a grant to M.P. from the Leverhulme Trust.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Mark Pagel.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information

The file contains Supplementary Figures S1 and S2 with Legends, Supplementary Tables S1 and S2, Supplementary Discussion and additional references. file was modified on 19 October 2007 to correct an error in the title of Table S1 (PDF 2280 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Pagel, M., Atkinson, Q. & Meade, A. Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature 449, 717–720 (2007).

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing