The study of language origin and divergence is important for understanding the history of human populations and their cultures. The Sino-Tibetan language family is the second largest in the world after Indo-European, and there is a long-running debate about its phylogeny and the time depth of its original divergence1. Here we perform a Bayesian phylogenetic analysis to examine two competing hypotheses of the origin of the Sino-Tibetan language family: the ‘northern-origin hypothesis’ and the ‘southwestern-origin hypothesis’. The northern-origin hypothesis states that the initial expansion of Sino-Tibetan languages occurred approximately 4,000–6,000 years before present (bp; taken as ad 1950) in the Yellow River basin of northern China2,3,4, and that this expansion is associated with the development of the Yangshao and/or Majiayao Neolithic cultures. The southwestern-origin hypothesis states that an early expansion of Sino-Tibetan languages occurred before 9,000 years bp from a region in southwest Sichuan province in China5 or in northeast India6, where a high diversity of Tibeto-Burman languages exists today. Consistent with the northern-origin hypothesis, our Bayesian phylogenetic analysis of 109 languages with 949 lexical root-meanings produced an estimated time depth for the divergence of Sino-Tibetan languages of approximately 4,200–7,800 years bp, with an average value of approximately 5,900 years bp. In addition, the phylogeny supported a dichotomy between Sinitic and Tibeto-Burman languages. Our results are compatible with the archaeological records, and with the farming and language dispersal hypothesis7 of agricultural expansion in China. Our findings provide a linguistic foothold for further interdisciplinary studies of prehistoric human activity in East Asia.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The data supporting the findings of this study are available in the Supplementary Information. Any other relevant data are available from the corresponding author upon reasonable request.
The codes supporting the findings of this study are available in the Supplementary Information.
Handel, Z. What is Sino-Tibetan? Snapshot of a field and a language family in flux. Lang. Linguist. Compass 2, 422–441 (2008).
LaPolla, R. J. in Areal Diffusion and Genetic Inheritance: Problems in Comparative Linguistics (eds Dixon, R. M. W. & Aikhenvald, A. Y.) 225–254 (Oxford Univ. Press, Oxford, 2001).
Matisoff, J. A. Sino-Tibetan linguistics: present state and future prospects. Annu. Rev. Anthropol. 20, 469–504 (1991).
LaPolla, R. J. & Thurgood, G. Sino-Tibetan Languages (Routledge, London, 2016).
van Driem, G. in The Peopling of East Asia: Putting Together Archaeology, Linguistics and Genetics (eds Sagart, L. et al.) 81–106 (Routledge, London, 2005).
Blench, R. & Post, M. in Trans-Himalayan Linguistics: Historical and Descriptive Linguistics of the Himalayan Area (eds Hill, N. & Owen-Smith, T.) 71–104 (De Gruyter Mouton, Berlin, 2013).
Bellwood, P. in The Peopling of East Asia (eds Sagart, L. et al.) 41–54 (Routledge, London, 2005).
Cavalli-Sforza, L. L., Piazza, A., Menozzi, P. & Mountain, J. Reconstruction of human evolution: bringing together genetic, archaeological, and linguistic data. Proc. Natl Acad. Sci. USA 85, 6002–6006 (1988).
Longobardi, G. et al. Across language families: genome diversity mirrors linguistic variation within Europe. Am. J. Phys. Anthropol. 157, 630–640 (2015).
Simons, G. F. & Fennig, C. D. Ethnologue: Languages of the World 21st edn (SIL International, Dallas, 2018).
Peiros, I. & Starostin, S. A Comparative Vocabulary of Five Sino-Tibetan Languages (Univ. of Melbourne, Department of Linguistics and Applied Linguistics, Melbourne, 1996).
Shafer, R. Classification of the Sino-Tibetan languages. Word 11, 94–111 (1955).
Fei, X.-T. On the problem of distinguishing nationalities in China. Soc. Sci. China 1, 158–174 (1980).
Janhunen, J. Manchuria: an Ethnic History (Finno-Ugrian Society, Helsinki, 1996).
Campbell, L. Historical Linguistics (Edinburgh Univ. Press, Edinburgh, 2013).
Lees, R. B. The basis of glottochronology. Language 29, 113–127 (1953).
Greenhill, S. J., Atkinson, Q. D., Meade, A. & Gray, R. D. The shape and tempo of language evolution. Proc. R. Soc. Lond. B 277, 2443–2450 (2010).
Aikhenvald, A. Y. & Dixon, R. M. W. in Areal Diffusion and Genetic Inheritance: Problems in Comparative Linguistics (eds Dixon, R. M. W. & Aikhenvald, A. Y.) 1–26 (Oxford Univ. Press, Oxford, 2001).
Gray, R. D., Drummond, A. J. & Greenhill, S. J. Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science 323, 479–483 (2009).
Bouckaert, R. et al. Mapping the origins and expansion of the Indo-European language family. Science 337, 957–960 (2012).
Kolipakam, V. et al. A Bayesian phylogenetic study of the Dravidian language family. R. Soc. Open Sci. 5, 171504 (2018).
Swadesh, M. Towards greater accuracy in lexicostatistic dating. Int. J. Am. Linguist. 21, 121–137 (1955).
Liu, L. & Chen, X. The Archaeology of China: From the Late Paleolithic to the Early Bronze Age (Cambridge Univ. Press, Cambridge, 2012).
Su, B. et al. Y chromosome haplotypes reveal prehistorical migrations to the Himalayas. Hum. Genet. 107, 582–590 (2000).
Stevens, C. J. & Fuller, D. Q. The spread of agriculture in eastern Asia. Lang. Dyn. Chang. 7, 152–186 (2017).
Diamond, J. & Bellwood, P. Farmers and their languages: the first expansions. Science 300, 597–603 (2003).
Li, K. S. Yunnan kaoguxue lunji (in Chinese) (Yunnan People’s Publishing House, Kunming, 1998).
Handel, Z. J. Old Chinese Medials and Their Sino-Tibetan Origins: A Comparative Study (Institute of Linguistics, Academia Sinica, 2009).
Sagart, L., Blench, R. & Sanchez-Mazas, A. in The Peopling of East Asia: Putting Together Archaeology, Linguistics and Genetics (eds Sagart, L. et al.) 1–14 (Routledge, London, 2005).
Hosner, D., Wagner, M., Tarasov, P. E., Chen, X. & Leipe, C. Spatiotemporal distribution patterns of archaeological sites in China during the Neolithic and Bronze Age: an overview. Holocene 26, 1576–1593 (2016).
Dryer, M. S. in The Sino-Tibetan Languages (eds Thurgood, G & LaPolla, R. J.) 43–55 (Routledge, London, 2003).
Hammarström, H., Forkel, R. & Haspelmath, M. Glottolog 3.1 http://glottolog.org (Max Planck Institute for the Science of Human History, Jena, 2017).
Bouckaert, R. et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLOS Comput. Biol. 10, e1003537 (2014).
Dunn, M., Kruspe, N. & Burenhult, N. Time and place in the prehistory of the Aslian languages. Hum. Biol. 85, 383–400 (2013).
Rambaut, A. et al. Tracer v.1. 6 http://beast.bio.ed.ac.uk (2014).
Cleveland, W. S. LOWESS: A program for smoothing scatterplots by robust locally weighted regression. Am. Stat. 35, 54 (1981).
This study is supported by projects at the National Natural Science Foundation of China (31521003, 31501010 and 31401060), the Postdoctoral Science Foundation of China (2015M570316 and 2015T80394), the Special Program for Key Basic Research of the Ministry of Science and Technology of the People’s Republic of China (2015FY111700), the Science and Technology Commission of Shanghai Municipality (16JC1400500 and 2017SHZDZX01) and the National Social Science Fund of China (13&ZD132 and 18ZDA296).
Nature thanks Joshua B. Plotkin and the other anonymous reviewer(s) for their contribution to the peer review of this work.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 The geographical distribution of samples of 109 Sino-Tibetan languages in the East Asia.
The colours show affiliations to linguistic clades. The map is based on vector map data from https://www.naturalearthdata.com.
The likelihood values for different combinations of mutation models, clock models and rate heterogeneity models. Each combination was run for 50 million generations, sampling every 5,000 generations. The first 10% of the iterations were treated as burn-in.
Extended Data Fig. 3 The maximum clade credibility tree of 109 Sino-Tibetan languages with node bars of ages and posterior probability values.
The iterations for tree reconstruction were set to 50 million generations, sampling trees every 5,000 generations and resulting in a sample of 10,000 trees. The first 10% of the iterations were treated as burn-in. The maximum clade credibility tree was established from 9,000 trees.
Extended Data Fig. 4 The results of four-point analysis for the maximum clade credibility tree of 109 Sino-Tibetan languages.
The maximum clade credibility tree was constructed on the basis of the best-fitting model combination, which was run for 50 million generations, sampling every 5,000 generations and treating the first 10% of the iterations as burn-in. The 109 Sino-Tibetan languages were grouped into major linguistic clades, which are labelled with the same colours and names as those shown in Fig. 1. The black numbers show the posterior probability values supporting the descendent clade. The red numbers in the parentheses show the reliability values on the internal nodes, calculated from four-point analysis.
The root time estimates for the 109 Sino-Tibetan languages with different combinations of mutation models, clock models and rate heterogeneity models. Each combination was run for 50 million generations, sampling every 5,000 generations. The first 10% of the iterations were treated as burn-in.
The root time estimates for the 107 Tibeto-Burman languages (that is, excluding ‘Chinese Mandarin’ and ‘Chinese Old’ from the Sino-Tibetan sample set) with different combinations of mutation models, clock models and rate heterogeneity models. Each combination was run for 50 million generations, sampling every 5,000 generations. The first 10% of the iterations were treated as burn-in.
The probability density estimates for the original homeland of the Sino-Tibetan languages via the phylogeographical approach, implemented in BayesTraits package. The iterations in BayesTraits were set to 1,000,000. The sample period was set to 1,000. The first 25% of the iterations were treated as burn-in. The map is based on vector map data from https://www.naturalearthdata.com.
This file contains the Supplementary Methods, Supplementary Discussion, Supplementary references and Supplementary Tables 2–4.
This table contains the information and lexical root-meanings of the ST languages in this paper selected from STEDT database according to the Swadesh 100 word-list.
This zipped file contains three files for the phylogenetic study of Sino-Tibetan languages: a nexus file, an XML file and a tree file of the MCC tree.
This zipped file contains raw data for estimating the Sino-Tibetan evolutionary tempo, and Matlab codes for generating the results shown in Figure 2.
This zipped file contains raw data and Matlab codes for applying four-point analysis on the Sino-Tibetan languages.
About this article
Cite this article
Zhang, M., Yan, S., Pan, W. et al. Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic. Nature 569, 112–115 (2019). https://doi.org/10.1038/s41586-019-1153-z
PLOS ONE (2020)
Morphological structure can escape reduction effects from mass admixture of second language speakers
Studies in Language (2020)
European Journal of Human Genetics (2020)
Evolutionary Human Sciences (2020)
Some observations on the transeurasian language family, from the perspective of the Farming/Language Dispersal Hypothesis
Evolutionary Human Sciences (2020)