Letter | Published:

Frequency of word-use predicts rates of lexical evolution throughout Indo-European history

Nature volume 449, pages 717720 (11 October 2007) | Download Citation


Greek speakers say “ουρ”, Germans “schwanz” and the French “queue” to describe what English speakers call a ‘tail’, but all of these languages use a related form of ‘two’ to describe the number after one. Among more than 100 Indo-European languages and dialects, the words for some meanings (such as ‘tail’) evolve rapidly, being expressed across languages by dozens of unrelated words, while others evolve much more slowly—such as the number ‘two’, for which all Indo-European language speakers use the same related word-form1. No general linguistic mechanism has been advanced to explain this striking variation in rates of lexical replacement among meanings. Here we use four large and divergent language corpora (English2, Spanish3, Russian4 and Greek5) and a comparative database of 200 fundamental vocabulary meanings in 87 Indo-European languages6 to show that the frequency with which these words are used in modern language predicts their rate of replacement over thousands of years of Indo-European language evolution. Across all 200 meanings, frequently used words evolve at slower rates and infrequently used words evolve more rapidly. This relationship holds separately and identically across parts of speech for each of the four language corpora, and accounts for approximately 50% of the variation in historical rates of lexical replacement. We propose that the frequency with which specific words are used in everyday language exerts a general and law-like influence on their rates of evolution. Our findings are consistent with social models of word change that emphasize the role of selection, and suggest that owing to the ways that humans use language, some words will evolve slowly and others rapidly across all languages.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1.

    , & in Mathematics in the Archaeological and Historical Sciences (eds Hodson, F. R., Kendall, D. G. & Tautu, P.) 361–380 (Edinburgh Univ. Press, Edinburgh, UK, 1971)

  2. 2.

    , & Word Frequencies in Written and Spoken English: based on the British National Corpus (Longman, London, 2001)

  3. 3.

    Corpus del Español. 〈〉 (2001–02)

  4. 4.

    in Corpus Linguistics Around the World (eds Archer, D., Wilson, A. & Rayson, P.) 167–180 (Rodopi, Amsterdam, 2005)

  5. 5.

    . Hellenic National Corpus (HNC) Web Version 3.0 [in Greek]. 〈〉 (1999–2006)

  6. 6.

    & Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426, 435–439 (2003)

  7. 7.

    Ethnologue: Languages of the World 15th edn (SIL International, Dallas, 2005)

  8. 8.

    Lexico-statistic dating of prehistoric ethnic contacts. Proc. Am. Phil. Soc. 96, 453–463 (1952)

  9. 9.

    , & Language divergence and estimated word retention rate. Language 43, 150–171 (1967)

  10. 10.

    & in Phylogenetic Methods and the Prehistory of Languages (eds Clackson, J., Forster, P. & Renfrew, C.) 173–182 (MacDonald Institute for Archaeological Research, Cambridge, UK, 2006)

  11. 11.

    Principles of Linguistic Change: Social Factors (Blackwell, Oxford, UK, 2001)

  12. 12.

    & Linguistic change, social network and speaker innovation. J. Linguist. 21, 229–284 (1985)

  13. 13.

    Is the rate of linguistic change constant? Lingua 108, 119–136 (1999)

  14. 14.

    & Language Contact, Creolization, and Genetic Linguistics (Univ. California Press, Berkeley, 1988)

  15. 15.

    The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, Cambridge, UK, 1983)

  16. 16.

    & Culture and the Evolutionary Process (Univ. Chicago Press, Chicago, 1985)

  17. 17.

    Function, Selection, and Innateness: the Emergence of Language Universals (Oxford Univ. Press, Oxford, UK, 1999)

  18. 18.

    Explaining Language Change: an Evolutionary Approach (Longman, Harlow, UK, 2000)

  19. 19.

    The beginning of the Bronze Age in Europe and the Indo-Europeans 3500–2500 B.C. J. Indo-Eur. Stud. 1, 163–214 (1973)

  20. 20.

    Archaeology and Language: the Puzzle of Indo-European Origins (Cape, London, 1987)

  21. 21.

    Acquisition of cognitive skill. Psychol. Rev. 89, 369–406 (1982)

  22. 22.

    Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Stud. Second Lang. Acquisit. 24, 143–188 (2002)

  23. 23.

    & Solvable null model for the distribution of word frequencies. Phys. Rev. E 70, 042901 (2004)

  24. 24.

    Prehistoric 'cultural strata' in the evolution of Germanic: The case of Gothic. Mod. Lang. Notes 62, 522–530 (1947)

  25. 25.

    , & Frequency Analysis of English Usage: Lexicon and Grammar (Houghton Mifflin, Boston, 1982)

  26. 26.

    , , , & Absence of the lactase-persistence-associated allele in early Neolithic Europeans. Proc. Natl Acad. Sci. USA 104, 3736–3741 (2007)

  27. 27.

    & in Proceedings of the 9th Conf. on Computational Natural Language Learning (CoNLL) 40–47 (ACL, Schroudsburg, PA, 2005)

  28. 28.

    Indo-European and its Closest Relatives: The Eurasiatic Language Family Vol. 1, Grammar (Stanford Univ. Press, Stanford, CA, 2000)

  29. 29.

    & Nostratic. Annu. Rev. Anthropol. 17, 309–329 (1988)

  30. 30.

    & A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53, 571–581 (2004)

  31. 31.

    , & An Indo-European classification, a lexicostatistical experiment. 1. Trans. Am. Phil. Soc. 82, 1–132 (1992)

  32. 32.

    , , , & Equations of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1091 (1953)

  33. 33.

    & in Mathematics of Evolution and Phylogeny (ed. Gascuel, O.) 121–139 (Oxford Univ. Press, New York, 2005)

  34. 34.

    & in The Evolution of Cultural Diversity: a Phylogenetic Approach (eds Mace, R., Holden, C. J. & Shennan, S.) 235–256 (UCL Press, London, 2005)

  35. 35.

    Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994)

  36. 36.

    , & Loss of information in genetic distances. Nature 336, 118 (1988)

Download references


We thank R. Gray and S. Greenhill for comments and advice. This research was supported by a grant to M.P. from the Leverhulme Trust.

Author information


  1. School of Biological Sciences, University of Reading, Whiteknights, Reading, Berkshire, RG6 6AS, UK

    • Mark Pagel
    • , Quentin D. Atkinson
    •  & Andrew Meade
  2. Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA

    • Mark Pagel


  1. Search for Mark Pagel in:

  2. Search for Quentin D. Atkinson in:

  3. Search for Andrew Meade in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Mark Pagel.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    The file contains Supplementary Figures S1 and S2 with Legends, Supplementary Tables S1 and S2, Supplementary Discussion and additional references. file was modified on 19 October 2007 to correct an error in the title of Table S1

About this article

Publication history






Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.