Letter | Published:

The evolution of language families is shaped by the environment beyond neutral drift


There are more than 7,000 languages spoken in the world today1. It has been argued that the natural and social environment of languages drives this diversity2,3,4,5,6,7,8,9,10,11,12,13. However, a fundamental question is how strong are environmental pressures, and does neutral drift suffice as a mechanism to explain diversification? We estimate the phylogenetic signals of geographic dimensions, distance to water, climate and population size on more than 6,000 phylogenetic trees of 46 language families. Phylogenetic signals of environmental factors are generally stronger than expected under the null hypothesis of no relationship with the shape of family trees. Importantly, they are also—in most cases—not compatible with neutral drift models of constant-rate change across the family tree branches. Our results suggest that language diversification is driven by further adaptive and non-adaptive pressures. Language diversity cannot be understood without modelling the pressures that physical, ecological and social factors exert on language users in different environments across the globe.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

Data availability is detailed in Supplementary Methods 1. Individual data files are described in Supplementary Data 17 in the Guide to the Supplementary Information.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Hammarström, H., Forkel, R., Haspelmath, M. & Bank, S. Glottolog 3.2 (Max Planck Institute for the Science of Human History, 2018); http://glottolog.org

  2. 2.

    Nichols, J. Linguistic diversity and the first settlement of the New World. Language 66, 475–521 (1990).

  3. 3.

    Nettle, D. Using social impact theory to simulate language change. Lingua 108, 95–117 (1999).

  4. 4.

    Coupé, C., Hombert, J.-M., Marsico, E. & Pellegrino, F. in East Flows the Great River: Festschrift in Honor of Prof. William S-Y. WANG on his 80th Birthday (eds Peng, G. & Shi, F.) 76–103 (City Univ. Hong Kong Press, Hong Kong, 2013).

  5. 5.

    Gavin, M. C. et al. Toward a mechanistic understanding of linguistic diversity. Bioscience 63, 524–535 (2013).

  6. 6.

    Mace, R. & Pagel, M. A latitudinal gradient in the density of human languages in North America. Proc. Biol. Sci. 261, 117–121 (1995).

  7. 7.

    Collard, I. F. & Foley, R. A. Latitudinal patterns and environmental determinants of recent human cultural diversity: do humans follow biogeographical rules? Evol. Ecol. Res. 4, 371–383 (2002).

  8. 8.

    Moore, J. L. et al. The distribution of cultural and biological diversity in Africa. Proc. Biol. Sci. 269, 1645–1653 (2002).

  9. 9.

    Dimmendaal, G. J. Language ecology and linguistic diversity on the African continent. Lang. Linguist. Compass 2, 840–858 (2008).

  10. 10.

    Axelsen, J. B. & Manrubia, S. River density and landscape roughness are universal determinants of linguistic diversity. Proc. Biol. Sci. 281, 20133029 (2014).

  11. 11.

    Gavin, M. C. & Sibanda, N. The island biogeography of languages. Glob. Ecol. Biogeogr. 21, 958–967 (2012).

  12. 12.

    Gavin, M. C. et al. Process-based modelling shows how climate and demography shape language diversity. Glob. Ecol. Biogeogr. 26, 584–591 (2017).

  13. 13.

    Currie, T. E. & Mace, R. Political complexity predicts the spread of ethnolinguistic groups. Proc. Natl Acad. Sci. USA 106, 7339–7344 (2009).

  14. 14.

    Cavalli-Sforza, L. L., Menozzi, P. & Piazza, A. The History and Geography of Human Genes (Princeton Univ. Press, Princeton, 1994).

  15. 15.

    Everett, C., Blasi, D. E. & Roberts, S. G. Climate, vocal folds, and tonal languages: connecting the physiological and geographic dots. Proc. Natl Acad. Sci. USA 112, 1322–1327 (2015).

  16. 16.

    Lupyan, G. & Dale, R. Why are there different languages? The role of adaptation in linguistic diversity. Trends Cogn. Sci. 20, 649–660 (2016).

  17. 17.

    Dediu, D., Janssen, R. & Moisik, S. R. Language is not isolated from its wider environment: vocal tract influences on the evolution of speech and language. Lang. Commun. 54, 9–20 (2017).

  18. 18.

    Welmers, W. E. African Language Structures (University of California Press, Berkeley and Los Angeles, 1973).

  19. 19.

    McMahon, A. M. Understanding Language Change (Cambridge Univ. Press, Cambridge, 1994).

  20. 20.

    Sapir, E. Language: An Introduction to the Study of Speech (Harcourt, Brace & World, New York, 1921).

  21. 21.

    Jones, M. C. & Singh, I. Exploring Language Change (Routledge, New York, 2005).

  22. 22.

    Blomberg, S. P., Garland, T. Jr., Ives, A. R. & Crespi, B. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution 57, 717–745 (2003).

  23. 23.

    Symonds, M. R. & Blomberg, S. P. in Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology (ed. Garamszegi, L. Z.) 105–130 (Springer, Heidelberg, 2014).

  24. 24.

    Verkerk, A. Diachronic change in Indo-European motion event encoding. J. Hist. Linguist. 4, 40–83 (2014).

  25. 25.

    Verkerk, A. The correlation between motion event encoding and path verb lexicon size in the Indo-European language family. Folia Linguist. Hist. 35, 307–358 (2014).

  26. 26.

    Bentz, C., Verkerk, A., Kiela, D., Hill, F. & Buttery, P. Adaptive communication: languages with more non-native speakers tend to have fewer word forms. PLoS ONE 10, e0128254 (2015).

  27. 27.

    Everett, C. Evidence for direct geographic influences on linguistic sounds: the case of ejectives. PLoS ONE 8, e65275 (2013).

  28. 28.

    Lupyan, G. & Dale, R. Language structure is partly determined by social structure. PLoS ONE 5, e8559 (2010).

  29. 29.

    Dale, R. & Lupyan, G. Understanding the origins of morphological diversity: the linguistic niche hypothesis. Adv. Complex Syst. 15, 1150017 (2012).

  30. 30.

    Bentz, C. & Winter, B. Languages with more second language speakers tend to lose nominal case. Lang. Dynam. Change 3, 1–27 (2013).

  31. 31.

    Revell, L. J., Harmon, L. J. & Collar, D. C. Phylogenetic signal, evolutionary process, and rate. Syst. Biol. 57, 591–601 (2008).

  32. 32.

    Thomason, S. G. & Kaufman, T. Language Contact, Creolization, and Genetic Linguistics (Univ. California Press, Berkeley & Oxford, 1988).

  33. 33.

    Diamond, J. M. Guns, Germs and Steel: The Fates of Human Societies (W. W. Norton, New York & London, 1999).

  34. 34.

    Güldemann, T. & Hammarström, H. in Language Dispersal, Diversification and Contact (eds Crevels, M. & Muysken, P.) (Oxford Univ. Press, Oxford, 2017).

  35. 35.

    Lupyan, G. & Dale, R. in Language Structure and Environment (eds De Busser, R. & LaPolla, R. J.) 289–316 (John Benjamins Publishing Company, Amsterdam, 2015).

  36. 36.

    Dediu, D. Making genealogical language classifications available for phylogenetic analysis: Newick trees, unified identifiers, and branch length. Lang. Dynam. Change 8, 1–21 (2018).

  37. 37.

    Lewis, M. P., Simons, G. F. & Fenning, C. D. Ethnologue: Languages of the World 17th edn (SIL International, Dallas, 2013); http://www.ethnologue.com

  38. 38.

    Dryer, M. S. & Haspelmath, M. The World Atlas of Language Structures Online (Max Planck Digital Library, 2013); http://wals.info/

  39. 39.

    Nichols, J., Witzlack-Makarevich, A. & Bickel, B. The AUTOTYP Genealogy and Geography Database 2013 Release (2013); https://www.spw.uzh.ch/autotyp/

  40. 40.

    Jäger, G. Global-scale phylogenetic linguistic inference from lexical resources. Preprint at http://arxiv.org/abs/1802.06079 (2018).

  41. 41.

    Münkemüller, T. et al. How to measure and test phylogenetic signal. Methods Ecol. Evol. 3, 743–756 (2012).

  42. 42.

    R Development Core Team R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2017).

  43. 43.

    Chamberlain, S. rgbif: Interface to the Global ‘Biodiversity’ Information Facility ‘API’, R package version 0.9.5 (2016); https://CRAN.R-project.org/package=rgbif

  44. 44.

    Pagel, M. Inferring evolutionary processes from phylogenies. Zool. Scr. 26, 331–348 (1997).

  45. 45.

    Pagel, M. Inferring the historical patterns of biological evolution. Nature 401, 877–884 (1999).

  46. 46.

    Freckleton, R. P., Harvey, P. H. & Pagel, M. Phylogenetic analysis and comparative data: a test and review of evidence. Am. Nat. 160, 712–726 (2002).

  47. 47.

    Revell, L. J. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).

  48. 48.

    Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, New York, 2016).

  49. 49.

    Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T-Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).

  50. 50.

    Kahle, D. & Wickham, H. ggmap: spatial visualization with ggplot2. R. J. 5, 144–161 (2013).

Download references


C.B. and G.J. were funded by the German Research Foundation (DFG FOR 2237; project ‘Words, Bones, Genes, Tools: Tracking Linguistic, Cultural, and Biological Trajectories of the Human Past’) and the ERC Advanced Grant 324246 EVOLAEMP. D.D. was funded by The Netherlands Organisation for Scientific Research VIDI grant 276-70-022 and the European Institutes for Advanced Study Fellowship Program. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

C.B. was responsible for project inception, statistical and phylogenetic analyses, and writing of the paper. D.D., A.V. and G.J. contributed data, phylogenetic analyses and writing.

Competing interests

The authors declare no competing interests.

Correspondence to Christian Bentz.

Supplementary information

Supplementary Information

Supplementary Results 1–4, Supplementary Methods 1–6, Supplementary Note 1

Reporting Summary

SI Guide

Supplementary Data set 1

Dediu’s forest data

Supplementary Data Set 2

Maximum likelihood trees data

Supplementary Data Set 3

Environmental variables data

Supplementary Data Set 4

All phylogenetic signals data

Supplementary Data Set 5

Phylogenetic signals for distances to lakes, rivers, and oceans data

Supplementary Data Set 6

Wilcoxon test results by tree set

Supplementary Data Set 7

Wilcoxon results by family

R code file

R analysis code files

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Fig. 1: Density distributions of phylogenetic signals for K and λ, and violin plots with distributions per environmental factor.
Fig. 2: Environmental factors reflected on family trees.
Fig. 3: Percentages of subsets in line with evolutionary hypotheses.