Human language as a culturally transmitted replicator

Key Points

  • There are currently 7,000 different living human languages. The peak of language diversity may have been 10,000 years ago when up to 20,000 different languages might have been spoken.

  • Languages evolve by a process of descent with modification that is remarkably similar to the evolution of biological species, and languages and species have many analogies, such as genes and words, lateral gene transfer and borrowing.

  • It is possible to construct family trees or phylogenies of languages that retrace the history of descent with modification of language families, such as the Indo-European languages. These trees are surprisingly tree-like, which shows that, despite the possibility of acquiring words from other languages, the majority of language elements are stably and vertically transmitted.

  • Languages show remarkable fidelity in their transmission, sometimes rivalling that of genes, despite being a culturally transmitted replicator that is subject to myriad population and social influences.

  • Words vary at least 100-fold in the rate at which new unrelated forms come to replace older words: there are 15 different ways to say 'bird' in Indo-European languages, but all of the ways of saying 'two' are related.

  • Words that are used at the highest frequencies in everyday speech are among the most conserved across languages, and some words have related forms that may go back over 10,000 years.

  • Language may act socially to reinforce group membership and identity. When a language initially divides into two distinct speech communities there may be a period of rapid change that serves to distinguish the two nascent languages.

  • Of the six possible ways that languages can order the subject (S), verb (V) and object (O) in a sentence, the SVO and SOV orders predominate in the world's languages. Word order has probably co-evolved over thousands of years with the way that a language modifies sentence objects.

Abstract

Human languages form a distinct and largely independent class of cultural replicators with behaviour and fidelity that can rival that of genes. Parallels between biological and linguistic evolution mean that statistical methods inspired by phylogenetics and comparative biology are being increasingly applied to study language. Phylogenetic trees constructed from linguistic elements chart the history of human cultures, and comparative studies reveal surprising and general features of how languages evolve, including patterns in the rates of evolution of language elements and social factors that influence temporal trends of language evolution. For many comparative questions of anthropology and human behavioural ecology, historical processes estimated from linguistic phylogenies may be more relevant than those estimated from genes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Tree of Indo-European languages.
Figure 2: Rates of lexical replacement.
Figure 3: Rates of lexical replacement are stable among language families.
Figure 4: Relationships between language and species distribution.
Figure 5: Evolution of word order changes.

References

  1. 1

    Gordon, R. G. Ethnologue: Languages of the World 15th edn (SIL International, Dallas, 2005).

    Google Scholar 

  2. 2

    Pagel, M. in The Evolutionary Emergence of Language (eds Knight, C., Studdert-Kennedy, M. & Hurford, J.) 391–416 (Cambridge Univ. Press, Cambridge 2000). An overview of linguistic diversity and how it can be studied phylogenetically and statistically.

    Google Scholar 

  3. 3

    Pagel, M. & Mace, R. The cultural wealth of nations. Nature 428, 275–278 (2004).

    CAS  Article  PubMed  Google Scholar 

  4. 4

    Darwin, C. The Descent of Man (Murray, London, 1871).

    Google Scholar 

  5. 5

    Swadesh, M. Lexico-statistic dating of prehistoric ethnic contacts. Proc. Am. Phil. Soc. 96, 453–463 (1952).

    Google Scholar 

  6. 6

    Embleton, Sheila M. Statistics in Historical Linguistics. Quantitative Linguistics Vol. 30 (Bochum, Brockmeyer, 1986).

    Google Scholar 

  7. 7

    Pagel, M. & A. Meade . in Phylogenetic Methods and the Prehistory of Languages (eds Forster, P. & Renfrew, C.) 173–182 (McDonald Institute for Archaeological Research, Cambridge, 2006).

    Google Scholar 

  8. 8

    Mace, R. & Pagel, M. The comparative method in anthropology. Curr. Anthropol. 35, 549–564 (1994). This paper formally introduced use of phylogenetic trees into comparative anthropology.

    Article  Google Scholar 

  9. 9

    Pagel M, Meade A. in The Evolution of Cultural Diversity: a Phylogenetic Approach (eds Mace R., Holden C. J. & Shennan S.) 235–256 (UCL Press, London, 2005).

    Google Scholar 

  10. 10

    Kruskal, J., Dyen, I. & Black, P. in Mathematics in the Archeological and Historical Sciences (eds Hodson, F. R., Kendall, D. G. & Tautu, P.) 361–380 (Edinburgh Univ. Press, Edinburgh, 1971).

    Google Scholar 

  11. 11

    Sankoff, D. in Current Trends in Linguistics 11: Diachronic, Areal and Typological Linguistics (ed. Sebeok, T. A.) 93–112 (Mouton, The Hague, 1973).

    Google Scholar 

  12. 12

    Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).

    CAS  Article  PubMed  Google Scholar 

  13. 13

    Nicholls, G. K. & Gray, R. D. in Phylogenetic Methods and the Prehistory of Languages (eds Forster, P. & Renfrew, C.) 161–171 (McDonald Institute for Archaeological Research, Cambridge, 2006).

    Google Scholar 

  14. 14

    Warnow, T., Evans, S. N., Ringe, D. & Nakhleh, L. in Phylogenetic Methods and the Prehistory of Languages (eds Forster, P. & Renfrew, C.) 75–87 (McDonald Institute for Archaeological Research, Cambridge, 2006).

    Google Scholar 

  15. 15

    Pagel, M. Inferring the historical patterns of biological evolution. Nature 401, 877–884 (1999).

    CAS  Article  PubMed  Google Scholar 

  16. 16

    Edwards, A. W. E. Likelihood (Cambridge Univ. Press, Cambridge, 1972).

    Google Scholar 

  17. 17

    Gilks, W. R., Richardson, S. & Spiegelhalter, D. J. in Markov Chain Monte Carlo in Practice (eds Gilks, W. R., Richardson, S. & Spiegelhalter, D. J.) 1–19 (Chapman and Hall, 1996).

    Google Scholar 

  18. 18

    Huelsenbeck, J. P., Ronquist, F., Nielsen, R. & Bollback, J. P. Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 294, 2310–2314 (2001).

    CAS  Article  Google Scholar 

  19. 19

    Pagel, M. in Time-Depth in Historical Linguistics (eds Renfrew, C., MacMahon, A. & Trask L.) 189–207 (The McDonald Institute of Archaeology, Cambridge, 2000).

    Google Scholar 

  20. 20

    Gray, R. & Jordan, F. Language trees support the express-train sequence of Austronesian expansion. Nature 405, 1052–1055 (2000).

    CAS  Article  PubMed  Google Scholar 

  21. 21

    Holden, C. J. Bantu language trees reflect the spread of farming across Sub-Saharan Africa: a maximum-parsimony analysis. Proc. R. Soc. Lond., B269, 793–799 (2002). This paper describes an early application of phylogenetic methods in linguistics.

    Article  Google Scholar 

  22. 22

    Holden, C. J., Meade, A. & Pagel, M. in The Evolution of Cultural Diversity: a Phylogenetic Approach (eds Mace R., Holden C. J. & Shennan S.) 53–65 (UCL Press, London, 2005).

    Google Scholar 

  23. 23

    Gray, R. D. & Atkinson, Q. D. Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426, 435–439 (2003). This study used language phylogeny to test a historical hypothesis for the timing of the origin of Indo-European languages.

    CAS  Article  PubMed  Google Scholar 

  24. 24

    Dunn, M., Terrill, A., Reesink, G., Foley, R. A. & Levinson, S. C. Structural phylogenetics and the reconstruction of ancient language history. Science 309, 2072–2075 (2005).

    CAS  Article  PubMed  Google Scholar 

  25. 25

    Gray, R. D., Drummond, A. J. & Greenhill, S. J. Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science 323, 479–483 (2009). This paper describes the use of a language phylogeny to test a historical hypothesis for the timing of the origin of Austronesian languages.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. 26

    Pagel, M., Atkinson, Q. D. & Meade, A. Frequency of word use predicts rates of lexical evolution throughout Indo-European history. Nature 449, 717–719 (2007). A statistical phylogenetic study that proposed a general explanation for variation in rates of lexical replacement.

    CAS  Article  PubMed  Google Scholar 

  27. 27

    Sanderson, M. J. & Donoghue, M. J. Patterns of variation in levels of homoplasy. Evolution 43, 1781–1795 (1989).

    Article  PubMed  Google Scholar 

  28. 28

    Huson, D. H. SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68–73 (1998).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  29. 29

    Bryant, D., Filimon, F. & Gray, R. D. in The Evolution of Cultural Diversity: Phylogenetic Approaches (eds Mace, R., Holden, C. & Shennan, S.) 69–85 (UCL Press, London, 2005).

    Google Scholar 

  30. 30

    Renfrew, C. Archaeology and Language: the Puzzle of Indo-European Origins (Cape, London, 1987). Classic text on the origin of the Indo-European language family.

    Google Scholar 

  31. 31

    Greenhill, S., Currie, T. & Gray. R. Does horizontal transmission invalidate cultural phylogenies? Proc. R. Soc. Lond.,B 18 Mar 2009 (doi:rspb.2008.1944).

  32. 32

    Cavalli-Sforza, L. L., Piazza, A., Menozzi, P. & Mountain, J. Reconstruction of human evolution: bringing together genetic, archaeological, and linguistic data. Proc. Natl Acad. Sci. USA 85, 6002–6006 (1988). This paper is a widely cited early attempt to link genetic and linguistic diversity.

    CAS  Article  PubMed  Google Scholar 

  33. 33

    Lansing, J. S. et al. Coevolution of languages and genes on the island of Sumba, eastern Indonesia. Proc. Natl Acad. Sci. USA 104, 16022–16026 (2007).

    CAS  Article  PubMed  Google Scholar 

  34. 34

    Hunley, K. et al. Genetic and linguistic coevolution in Northern Island Melanesia. PLoS Genet. 4, 1–14 (2008).

    Article  Google Scholar 

  35. 35

    Dediu, D. & Ladd, D. R. Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and microcephalin. Proc. Natl Acad. Sci. USA 104, 10944–10949 (2007).

    CAS  Article  PubMed  Google Scholar 

  36. 36

    Holden, C. J. & Mace, R. Spread of cattle led to the loss of matriliny in Africa: a co-evolutionary analysis. Proc. R. Soc. Lond., B 270, 2425–2433 (2003). A good example of the use of language trees to study cultural evolution.

    Article  Google Scholar 

  37. 37

    Fortunato, L., Holden, C. J. & Mace, R. From bridewealth to dowry? A Bayesian estimation of ancestral states of marriage transfers in Indo-European groups. Human Nature 17, 355–376 (2006).

    Article  PubMed  Google Scholar 

  38. 38

    Mace, R. & Jordan, F. in The Evolution of Cultural Diversity: a Phylogenetic Approach (eds Mace, R., Holden, C. & Shennan, S.) 207–216 (UCL Press, London, 2005).

    Google Scholar 

  39. 39

    Burger, J., Kirchner, M., Bramanti, B., Haak, W. & Thomas, M. G. Absence of the lactase-persistence-associated allele in early Neolithic Europeans. Proc. Natl Acad. Sci. USA 104, 3736–3741 (2007).

    CAS  Article  Google Scholar 

  40. 40

    Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  41. 41

    Zipf, G. K. Human Behaviour and the Principle of Least Effort (Addison-Wesley, Reading, Massachusetts, 1949).

    Google Scholar 

  42. 42

    Leech, G., Rayson, P. & Wilson, A. Word Frequencies in Written and Spoken English: Based on the British National Corpus (Longman, London, 2001).

    Google Scholar 

  43. 43

    Zipf, G. K. Prehistoric 'cultural strata' in the evolution of Germanic: the case of Gothic. Mod. Lang. Notes 62, 522–530 (1947).

    Article  Google Scholar 

  44. 44

    Francis, W. N., Kuçera, H. & Mackie, A. W. Frequency Analysis of English Usage: Lexicon and Grammar (Houghton Mifflin, Boston, 1982).

    Google Scholar 

  45. 45

    Lieberman, E., Michel, J.-B., Jackson, J., Tang, T. & Nowak, M. A. Quantifying the evolutionary dynamics of language. Nature 449, 713–716 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  46. 46

    Starostin, S. A. in Works on Linguistics (ed. Starostin, S. A.) 827–839 (Languages of the Slavic Culture, Moscow, 2007).

    Google Scholar 

  47. 47

    Ellis, N. C. Frequency effects in language processing: a review with implications for theories of implicit and explicit language acquisition. Stud. Second Lang. Acquisit. 24, 143–188 (2002).

    Google Scholar 

  48. 48

    Huettig, F., Quinlan, P. T., McDonald, S. A. & Altmann, G. T. M. Models of high-dimensional semantic space predict language-mediated eye movements in the visual world. Acta Psychol. 121, 65–80 (2006).

    Article  Google Scholar 

  49. 49

    Birdsell, J. B. Some environmental and cultural factors influencing the structuring of Australian Aboriginal populations. Am. Nat. 87, 171–207 (1953).

    Article  Google Scholar 

  50. 50

    Nichols, J. Linguistic Diversity in Space and Time (Univ. of Chicago Press, Chicago, 1992).

    Google Scholar 

  51. 51

    Mace, R. & Pagel, M. A latitudinal gradient in the density of human languages in North America. Proc. Roy. Soc. Lond., B 261, 117–121 (1995).

    Article  Google Scholar 

  52. 52

    Barbujani, G. & Sokal, R. R. Zones of sharp genetic change in Europe are also linguistic boundaries. Proc. Natl Acad. Sci. USA 87, 1816–1819 (1990).

    CAS  Article  PubMed  Google Scholar 

  53. 53

    Labov, W. Principles of Linguistic Change: Social Factors (Blackwell, Oxford, 2001).

    Google Scholar 

  54. 54

    Milroy, J. & Milroy, L. Linguistic change, social network and speaker innovation. J. Linguist 21, 229–284 (1985).

    Article  Google Scholar 

  55. 55

    Webster, N. Dissertations on the English Language (Isaiah Thomas, Boston, 1789).

    Google Scholar 

  56. 56

    Atkinson, Q., Meade, A., Venditti, C., Greenhill, S. & Pagel, M. Languages evolve in punctuational bursts. Science 319, 588 (2008).

    CAS  Article  PubMed  Google Scholar 

  57. 57

    Pagel, M., Venditti, C. & Meade, A. Large punctuational contribution of speciation to evolutionary divergence at the molecular level. Science 314, 119–121 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58

    Greenberg, J. H. (ed.) Universals of Languages (MIT Press, Cambridge, Massachusetts, 1963).

    Google Scholar 

  59. 59

    Kirby, S. Function, Selection, and Innateness: the Emergence of Language Universals (Oxford Univ. Press, Oxford, 1999).

    Google Scholar 

  60. 60

    Cysouw, M. in Quantitative Linguistics: An International Handbook (eds Altmann, G., Köhler, R. & Piotrowski, R.) 554–578 (Mouton de Gruyter, Berlin, 2005).

    Google Scholar 

  61. 61

    Croft, W. Explaining Language Change: an Evolutionary Approach (Longman, Harlow, 2000). This text provides a good overview of evolutionary thinking about language.

    Google Scholar 

  62. 62

    Newmeyer, F. J. in The Evolutionary Emergence of Language (eds Knight, C., Studdert-Kennedy, M. & Hurford, J.) 372–388 (Cambridge Univ. Press, Cambridge, 2000).

    Google Scholar 

  63. 63

    Haspelmath, M. & Siegmund, S. Simulating the replication of some of Greenberg's word order predictions. Linguistic Typology 10, 74–82 (2006).

    Google Scholar 

  64. 64

    Mace, R. The Evolution of Cultural Diversity: a Phylogenetic Approach (eds Mace R., Holden C. J. & Shennan S.) 1–10 (UCL Press, London, 2005).

    Google Scholar 

  65. 65

    Harvey, P. H. & Pagel, M. D. The Comparative Method in Evolutionary Biology (Oxford Univ. Press, Oxford, 1991).

    Google Scholar 

  66. 66

    Green, P. J. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732 (1995).

    Article  Google Scholar 

  67. 67

    Pagel, M. & Meade, A. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. Am. Nat. 167, 808–825 (2006).

    PubMed  Google Scholar 

  68. 68

    Pagel, M., Meade, A. & Scott, D. Assembly rules for protein interaction networks. BMC Evol. Biol. 7 (Suppl. 1), S16 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  69. 69

    Niyogi, P. The Computational Nature of Language Learning and Evolution (MIT Press, Cambridge, Massachusetts, 2006).

    Google Scholar 

  70. 70

    Mangel, M. & Clark, C. W. Dynamic Modeling in Behavioral Ecology (Princeton Univ. Press, Princeton, New Jersey, 1988).

    Google Scholar 

  71. 71

    Haspelmath, M., Dryer M. S., Gil, D. & Comrie, B. (eds) The World Atlas of Linguistic Structures Max Planck Digital Library [online], (2008).

    Google Scholar 

  72. 72

    Greenhill, S. J., Blust, R. & Gray, R. D. The Austronesian Basic Vocabulary Database: from bioinformatics to lexomics. Evol. Bioinform. Online 4, 271–283 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  73. 73

    Christiansen, M. H. & Chater, N. Language as shaped by the brain. Behav. Brain Sci. 31, 489–558 (2008).

    PubMed  Google Scholar 

  74. 74

    Gell-Mann, M. The Quark and the Jaguar: Adventures in the Simple and Complex (W.H. Freeman New York, 1994).

    Google Scholar 

Download references

Acknowledgements

I thank C. Venditti, A. Calude, I. Peiros, A. Meade, Q. Atkinson, M. Ruhlen, M. Cysouw and M. Haspelmath for help, comments and suggestions. Grants to M.P. from the Leverhulme Trust and the Natural Environment Research Council supported this work.

Author information

Affiliations

Authors

Related links

Related links

FURTHER INFORMATION

Mark Pagel's homepage

Austronesian Basic Vocabulary Database

SplitsTree

Swadesh list

World Atlas of Linguistic Structures

Glossary

Languages

Linguists identify two languages as distinct when, according to various criteria, they become mutually unintelligible.

Phylogeny

A branching diagram describing the set of ancestral–descendant relationships among a group of species or languages.

Borrowing

The acquisition of a new non-cognate word from another language.

Phoneme

Characteristically thought of as the smallest units of speech-sounds that are distinguished by the speakers of a particular language. Phonemes are not universal, but act as the fundamental building blocks to produce all of the words of a given language.

Cognate

Two words are deemed cognate if they derive by a process of descent with modification from a common ancestral word.

Sum over histories

A mathematical technique that accounts for all possible ancestral states (that is, all possible histories) when finding the likelihood of observing the gene sequence or other data among extant species.

Parsimony

When applied to phylogenetic inference in a linguistic context, parsimony is a method that seeks the phylogenetic tree that implies the fewest number of changes among cognate classes.

Distance

As applied to phylogenetic inference in a linguistic context, distance is a set of methods that infer an underlying phylogenetic tree from a matrix of the pair-wise differences among all languages.

Likelihood

A statistical quantity defined as an amount that is proportional to the probability of observing some set of data given a particular model of how those data arose. In linguistic phylogenetic applications one finds the likelihood of the lexical data on the proposed tree given some model of how words evolve.

Maximum likelihood method

A statistical technique for finding the parameters of a model that make the observed data most likely or probable under that model.

Markov chain Monte Carlo

(MCMC). A statistical method for searching a complex high-dimensional space. As applied to phylogenetic inference in a linguistic context, MCMC methods return a sample of trees that are statistically representative of the trees that might arise from a given model of how words evolve.

Indo-European languages

A family of related languages that derive from a common ancestral language that probably arose in Anatolia around 8,000 years ago and then spread throughout Europe, India, and what is now Afghanistan, Pakistan and Iran.

Monophyletic

In a phylogenetic context, a group of species (or languages) is monophyletic if they derive from a common ancestor not shared with any other species (or languages). The Germanic languages are monophyletic and are distinct from the monophyletic group of Romance languages. Monophyly implies that the group has just one origin.

Bantu languages

A group of approximately 500 languages that is part of the larger Niger-Congo language family. Bantu languages probably arose 3,000 years ago in West Africa, possibly close to present day Cameroon, and then spread east and then south eventually reaching to present day South Africa.

Clade

In the context of languages, a clade is a group of related languages.

Lexical replacement

The rate of lexical replacement is the rate at which a word is replaced by a new non-cognate word.

Language year

In a phylogenetic context, each of the branches of a phylogeny represents some amount of evolution that occurs independently of the evolution in other branches. If the times in years that these branches represent are added together, the result records the total number of years of evolution that the tree represents; that is, the total number of language years.

Gamma correction

An elegant mathematical technique developed for characterizing the evolution of gene sequences that allows the nucleotides at different sites in the gene to evolve or be replaced at varying rates. The same technique can be applied to characterize the differing rates of evolution among lexical items.

Linguistic universals

A set of features of language and relationships among those features that the great comparative linguist Joseph Greenberg proposed would be found in all or nearly all languages, or which would at least show statistical evidence for being linked.

Word order

The typical order of subjects, verbs and objects in a sentence.

Pre versus postpositioning

Whether a language places the phrase that modifies a sentence object before (preposition) or after (postposition) that object in the sentence.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Pagel, M. Human language as a culturally transmitted replicator. Nat Rev Genet 10, 405–415 (2009). https://doi.org/10.1038/nrg2560

Download citation

Further reading