There are currently ∼7,000 different living human languages. The peak of language diversity may have been ∼10,000 years ago when up to 20,000 different languages might have been spoken.
Languages evolve by a process of descent with modification that is remarkably similar to the evolution of biological species, and languages and species have many analogies, such as genes and words, lateral gene transfer and borrowing.
It is possible to construct family trees or phylogenies of languages that retrace the history of descent with modification of language families, such as the Indo-European languages. These trees are surprisingly tree-like, which shows that, despite the possibility of acquiring words from other languages, the majority of language elements are stably and vertically transmitted.
Languages show remarkable fidelity in their transmission, sometimes rivalling that of genes, despite being a culturally transmitted replicator that is subject to myriad population and social influences.
Words vary at least 100-fold in the rate at which new unrelated forms come to replace older words: there are 15 different ways to say 'bird' in Indo-European languages, but all of the ways of saying 'two' are related.
Words that are used at the highest frequencies in everyday speech are among the most conserved across languages, and some words have related forms that may go back over 10,000 years.
Language may act socially to reinforce group membership and identity. When a language initially divides into two distinct speech communities there may be a period of rapid change that serves to distinguish the two nascent languages.
Of the six possible ways that languages can order the subject (S), verb (V) and object (O) in a sentence, the SVO and SOV orders predominate in the world's languages. Word order has probably co-evolved over thousands of years with the way that a language modifies sentence objects.
Human languages form a distinct and largely independent class of cultural replicators with behaviour and fidelity that can rival that of genes. Parallels between biological and linguistic evolution mean that statistical methods inspired by phylogenetics and comparative biology are being increasingly applied to study language. Phylogenetic trees constructed from linguistic elements chart the history of human cultures, and comparative studies reveal surprising and general features of how languages evolve, including patterns in the rates of evolution of language elements and social factors that influence temporal trends of language evolution. For many comparative questions of anthropology and human behavioural ecology, historical processes estimated from linguistic phylogenies may be more relevant than those estimated from genes.
Subscribe to Journal
Get full journal access for 1 year
only $22.08 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Gordon, R. G. Ethnologue: Languages of the World 15th edn (SIL International, Dallas, 2005).
Pagel, M. in The Evolutionary Emergence of Language (eds Knight, C., Studdert-Kennedy, M. & Hurford, J.) 391–416 (Cambridge Univ. Press, Cambridge 2000). An overview of linguistic diversity and how it can be studied phylogenetically and statistically.
Pagel, M. & Mace, R. The cultural wealth of nations. Nature 428, 275–278 (2004).
Darwin, C. The Descent of Man (Murray, London, 1871).
Swadesh, M. Lexico-statistic dating of prehistoric ethnic contacts. Proc. Am. Phil. Soc. 96, 453–463 (1952).
Embleton, Sheila M. Statistics in Historical Linguistics. Quantitative Linguistics Vol. 30 (Bochum, Brockmeyer, 1986).
Pagel, M. & A. Meade . in Phylogenetic Methods and the Prehistory of Languages (eds Forster, P. & Renfrew, C.) 173–182 (McDonald Institute for Archaeological Research, Cambridge, 2006).
Mace, R. & Pagel, M. The comparative method in anthropology. Curr. Anthropol. 35, 549–564 (1994). This paper formally introduced use of phylogenetic trees into comparative anthropology.
Pagel M, Meade A. in The Evolution of Cultural Diversity: a Phylogenetic Approach (eds Mace R., Holden C. J. & Shennan S.) 235–256 (UCL Press, London, 2005).
Kruskal, J., Dyen, I. & Black, P. in Mathematics in the Archeological and Historical Sciences (eds Hodson, F. R., Kendall, D. G. & Tautu, P.) 361–380 (Edinburgh Univ. Press, Edinburgh, 1971).
Sankoff, D. in Current Trends in Linguistics 11: Diachronic, Areal and Typological Linguistics (ed. Sebeok, T. A.) 93–112 (Mouton, The Hague, 1973).
Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).
Nicholls, G. K. & Gray, R. D. in Phylogenetic Methods and the Prehistory of Languages (eds Forster, P. & Renfrew, C.) 161–171 (McDonald Institute for Archaeological Research, Cambridge, 2006).
Warnow, T., Evans, S. N., Ringe, D. & Nakhleh, L. in Phylogenetic Methods and the Prehistory of Languages (eds Forster, P. & Renfrew, C.) 75–87 (McDonald Institute for Archaeological Research, Cambridge, 2006).
Pagel, M. Inferring the historical patterns of biological evolution. Nature 401, 877–884 (1999).
Edwards, A. W. E. Likelihood (Cambridge Univ. Press, Cambridge, 1972).
Gilks, W. R., Richardson, S. & Spiegelhalter, D. J. in Markov Chain Monte Carlo in Practice (eds Gilks, W. R., Richardson, S. & Spiegelhalter, D. J.) 1–19 (Chapman and Hall, 1996).
Huelsenbeck, J. P., Ronquist, F., Nielsen, R. & Bollback, J. P. Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 294, 2310–2314 (2001).
Pagel, M. in Time-Depth in Historical Linguistics (eds Renfrew, C., MacMahon, A. & Trask L.) 189–207 (The McDonald Institute of Archaeology, Cambridge, 2000).
Gray, R. & Jordan, F. Language trees support the express-train sequence of Austronesian expansion. Nature 405, 1052–1055 (2000).
Holden, C. J. Bantu language trees reflect the spread of farming across Sub-Saharan Africa: a maximum-parsimony analysis. Proc. R. Soc. Lond., B269, 793–799 (2002). This paper describes an early application of phylogenetic methods in linguistics.
Holden, C. J., Meade, A. & Pagel, M. in The Evolution of Cultural Diversity: a Phylogenetic Approach (eds Mace R., Holden C. J. & Shennan S.) 53–65 (UCL Press, London, 2005).
Gray, R. D. & Atkinson, Q. D. Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426, 435–439 (2003). This study used language phylogeny to test a historical hypothesis for the timing of the origin of Indo-European languages.
Dunn, M., Terrill, A., Reesink, G., Foley, R. A. & Levinson, S. C. Structural phylogenetics and the reconstruction of ancient language history. Science 309, 2072–2075 (2005).
Gray, R. D., Drummond, A. J. & Greenhill, S. J. Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science 323, 479–483 (2009). This paper describes the use of a language phylogeny to test a historical hypothesis for the timing of the origin of Austronesian languages.
Pagel, M., Atkinson, Q. D. & Meade, A. Frequency of word use predicts rates of lexical evolution throughout Indo-European history. Nature 449, 717–719 (2007). A statistical phylogenetic study that proposed a general explanation for variation in rates of lexical replacement.
Sanderson, M. J. & Donoghue, M. J. Patterns of variation in levels of homoplasy. Evolution 43, 1781–1795 (1989).
Huson, D. H. SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68–73 (1998).
Bryant, D., Filimon, F. & Gray, R. D. in The Evolution of Cultural Diversity: Phylogenetic Approaches (eds Mace, R., Holden, C. & Shennan, S.) 69–85 (UCL Press, London, 2005).
Renfrew, C. Archaeology and Language: the Puzzle of Indo-European Origins (Cape, London, 1987). Classic text on the origin of the Indo-European language family.
Greenhill, S., Currie, T. & Gray. R. Does horizontal transmission invalidate cultural phylogenies? Proc. R. Soc. Lond.,B 18 Mar 2009 (doi:rspb.2008.1944).
Cavalli-Sforza, L. L., Piazza, A., Menozzi, P. & Mountain, J. Reconstruction of human evolution: bringing together genetic, archaeological, and linguistic data. Proc. Natl Acad. Sci. USA 85, 6002–6006 (1988). This paper is a widely cited early attempt to link genetic and linguistic diversity.
Lansing, J. S. et al. Coevolution of languages and genes on the island of Sumba, eastern Indonesia. Proc. Natl Acad. Sci. USA 104, 16022–16026 (2007).
Hunley, K. et al. Genetic and linguistic coevolution in Northern Island Melanesia. PLoS Genet. 4, 1–14 (2008).
Dediu, D. & Ladd, D. R. Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and microcephalin. Proc. Natl Acad. Sci. USA 104, 10944–10949 (2007).
Holden, C. J. & Mace, R. Spread of cattle led to the loss of matriliny in Africa: a co-evolutionary analysis. Proc. R. Soc. Lond., B 270, 2425–2433 (2003). A good example of the use of language trees to study cultural evolution.
Fortunato, L., Holden, C. J. & Mace, R. From bridewealth to dowry? A Bayesian estimation of ancestral states of marriage transfers in Indo-European groups. Human Nature 17, 355–376 (2006).
Mace, R. & Jordan, F. in The Evolution of Cultural Diversity: a Phylogenetic Approach (eds Mace, R., Holden, C. & Shennan, S.) 207–216 (UCL Press, London, 2005).
Burger, J., Kirchner, M., Bramanti, B., Haak, W. & Thomas, M. G. Absence of the lactase-persistence-associated allele in early Neolithic Europeans. Proc. Natl Acad. Sci. USA 104, 3736–3741 (2007).
Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994).
Zipf, G. K. Human Behaviour and the Principle of Least Effort (Addison-Wesley, Reading, Massachusetts, 1949).
Leech, G., Rayson, P. & Wilson, A. Word Frequencies in Written and Spoken English: Based on the British National Corpus (Longman, London, 2001).
Zipf, G. K. Prehistoric 'cultural strata' in the evolution of Germanic: the case of Gothic. Mod. Lang. Notes 62, 522–530 (1947).
Francis, W. N., Kuçera, H. & Mackie, A. W. Frequency Analysis of English Usage: Lexicon and Grammar (Houghton Mifflin, Boston, 1982).
Lieberman, E., Michel, J.-B., Jackson, J., Tang, T. & Nowak, M. A. Quantifying the evolutionary dynamics of language. Nature 449, 713–716 (2007).
Starostin, S. A. in Works on Linguistics (ed. Starostin, S. A.) 827–839 (Languages of the Slavic Culture, Moscow, 2007).
Ellis, N. C. Frequency effects in language processing: a review with implications for theories of implicit and explicit language acquisition. Stud. Second Lang. Acquisit. 24, 143–188 (2002).
Huettig, F., Quinlan, P. T., McDonald, S. A. & Altmann, G. T. M. Models of high-dimensional semantic space predict language-mediated eye movements in the visual world. Acta Psychol. 121, 65–80 (2006).
Birdsell, J. B. Some environmental and cultural factors influencing the structuring of Australian Aboriginal populations. Am. Nat. 87, 171–207 (1953).
Nichols, J. Linguistic Diversity in Space and Time (Univ. of Chicago Press, Chicago, 1992).
Mace, R. & Pagel, M. A latitudinal gradient in the density of human languages in North America. Proc. Roy. Soc. Lond., B 261, 117–121 (1995).
Barbujani, G. & Sokal, R. R. Zones of sharp genetic change in Europe are also linguistic boundaries. Proc. Natl Acad. Sci. USA 87, 1816–1819 (1990).
Labov, W. Principles of Linguistic Change: Social Factors (Blackwell, Oxford, 2001).
Milroy, J. & Milroy, L. Linguistic change, social network and speaker innovation. J. Linguist 21, 229–284 (1985).
Webster, N. Dissertations on the English Language (Isaiah Thomas, Boston, 1789).
Atkinson, Q., Meade, A., Venditti, C., Greenhill, S. & Pagel, M. Languages evolve in punctuational bursts. Science 319, 588 (2008).
Pagel, M., Venditti, C. & Meade, A. Large punctuational contribution of speciation to evolutionary divergence at the molecular level. Science 314, 119–121 (2006).
Greenberg, J. H. (ed.) Universals of Languages (MIT Press, Cambridge, Massachusetts, 1963).
Kirby, S. Function, Selection, and Innateness: the Emergence of Language Universals (Oxford Univ. Press, Oxford, 1999).
Cysouw, M. in Quantitative Linguistics: An International Handbook (eds Altmann, G., Köhler, R. & Piotrowski, R.) 554–578 (Mouton de Gruyter, Berlin, 2005).
Croft, W. Explaining Language Change: an Evolutionary Approach (Longman, Harlow, 2000). This text provides a good overview of evolutionary thinking about language.
Newmeyer, F. J. in The Evolutionary Emergence of Language (eds Knight, C., Studdert-Kennedy, M. & Hurford, J.) 372–388 (Cambridge Univ. Press, Cambridge, 2000).
Haspelmath, M. & Siegmund, S. Simulating the replication of some of Greenberg's word order predictions. Linguistic Typology 10, 74–82 (2006).
Mace, R. The Evolution of Cultural Diversity: a Phylogenetic Approach (eds Mace R., Holden C. J. & Shennan S.) 1–10 (UCL Press, London, 2005).
Harvey, P. H. & Pagel, M. D. The Comparative Method in Evolutionary Biology (Oxford Univ. Press, Oxford, 1991).
Green, P. J. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732 (1995).
Pagel, M. & Meade, A. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. Am. Nat. 167, 808–825 (2006).
Pagel, M., Meade, A. & Scott, D. Assembly rules for protein interaction networks. BMC Evol. Biol. 7 (Suppl. 1), S16 (2007).
Niyogi, P. The Computational Nature of Language Learning and Evolution (MIT Press, Cambridge, Massachusetts, 2006).
Mangel, M. & Clark, C. W. Dynamic Modeling in Behavioral Ecology (Princeton Univ. Press, Princeton, New Jersey, 1988).
Haspelmath, M., Dryer M. S., Gil, D. & Comrie, B. (eds) The World Atlas of Linguistic Structures Max Planck Digital Library [online], (2008).
Greenhill, S. J., Blust, R. & Gray, R. D. The Austronesian Basic Vocabulary Database: from bioinformatics to lexomics. Evol. Bioinform. Online 4, 271–283 (2008).
Christiansen, M. H. & Chater, N. Language as shaped by the brain. Behav. Brain Sci. 31, 489–558 (2008).
Gell-Mann, M. The Quark and the Jaguar: Adventures in the Simple and Complex (W.H. Freeman New York, 1994).
I thank C. Venditti, A. Calude, I. Peiros, A. Meade, Q. Atkinson, M. Ruhlen, M. Cysouw and M. Haspelmath for help, comments and suggestions. Grants to M.P. from the Leverhulme Trust and the Natural Environment Research Council supported this work.
Linguists identify two languages as distinct when, according to various criteria, they become mutually unintelligible.
A branching diagram describing the set of ancestral–descendant relationships among a group of species or languages.
The acquisition of a new non-cognate word from another language.
Characteristically thought of as the smallest units of speech-sounds that are distinguished by the speakers of a particular language. Phonemes are not universal, but act as the fundamental building blocks to produce all of the words of a given language.
Two words are deemed cognate if they derive by a process of descent with modification from a common ancestral word.
- Sum over histories
A mathematical technique that accounts for all possible ancestral states (that is, all possible histories) when finding the likelihood of observing the gene sequence or other data among extant species.
When applied to phylogenetic inference in a linguistic context, parsimony is a method that seeks the phylogenetic tree that implies the fewest number of changes among cognate classes.
As applied to phylogenetic inference in a linguistic context, distance is a set of methods that infer an underlying phylogenetic tree from a matrix of the pair-wise differences among all languages.
A statistical quantity defined as an amount that is proportional to the probability of observing some set of data given a particular model of how those data arose. In linguistic phylogenetic applications one finds the likelihood of the lexical data on the proposed tree given some model of how words evolve.
- Maximum likelihood method
A statistical technique for finding the parameters of a model that make the observed data most likely or probable under that model.
- Markov chain Monte Carlo
(MCMC). A statistical method for searching a complex high-dimensional space. As applied to phylogenetic inference in a linguistic context, MCMC methods return a sample of trees that are statistically representative of the trees that might arise from a given model of how words evolve.
- Indo-European languages
A family of related languages that derive from a common ancestral language that probably arose in Anatolia around 8,000 years ago and then spread throughout Europe, India, and what is now Afghanistan, Pakistan and Iran.
In a phylogenetic context, a group of species (or languages) is monophyletic if they derive from a common ancestor not shared with any other species (or languages). The Germanic languages are monophyletic and are distinct from the monophyletic group of Romance languages. Monophyly implies that the group has just one origin.
- Bantu languages
A group of approximately 500 languages that is part of the larger Niger-Congo language family. Bantu languages probably arose 3,000 years ago in West Africa, possibly close to present day Cameroon, and then spread east and then south eventually reaching to present day South Africa.
In the context of languages, a clade is a group of related languages.
- Lexical replacement
The rate of lexical replacement is the rate at which a word is replaced by a new non-cognate word.
- Language year
In a phylogenetic context, each of the branches of a phylogeny represents some amount of evolution that occurs independently of the evolution in other branches. If the times in years that these branches represent are added together, the result records the total number of years of evolution that the tree represents; that is, the total number of language years.
- Gamma correction
An elegant mathematical technique developed for characterizing the evolution of gene sequences that allows the nucleotides at different sites in the gene to evolve or be replaced at varying rates. The same technique can be applied to characterize the differing rates of evolution among lexical items.
- Linguistic universals
A set of features of language and relationships among those features that the great comparative linguist Joseph Greenberg proposed would be found in all or nearly all languages, or which would at least show statistical evidence for being linked.
- Word order
The typical order of subjects, verbs and objects in a sentence.
- Pre versus postpositioning
Whether a language places the phrase that modifies a sentence object before (preposition) or after (postposition) that object in the sentence.
About this article
Journal of The Royal Society Interface (2020)
A new phylogenetic protocol: dealing with model misspecification and confirmation bias in molecular phylogenetics
NAR Genomics and Bioinformatics (2020)
Influence of the tree prior and sampling scale on Bayesian phylogenetic estimates of the origin times of language families
Journal of Language Evolution (2019)
Physics of Life Reviews (2019)
Knowledge Synthesis for Scientific Management: Practical Integration for Complexity Versus Scientific Fragmentation for Simplicity
Journal of Management Inquiry (2019)