Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic

Abstract

The study of language origin and divergence is important for understanding the history of human populations and their cultures. The Sino-Tibetan language family is the second largest in the world after Indo-European, and there is a long-running debate about its phylogeny and the time depth of its original divergence1. Here we perform a Bayesian phylogenetic analysis to examine two competing hypotheses of the origin of the Sino-Tibetan language family: the ‘northern-origin hypothesis’ and the ‘southwestern-origin hypothesis’. The northern-origin hypothesis states that the initial expansion of Sino-Tibetan languages occurred approximately 4,000–6,000 years before present (bp; taken as ad 1950) in the Yellow River basin of northern China2,3,4, and that this expansion is associated with the development of the Yangshao and/or Majiayao Neolithic cultures. The southwestern-origin hypothesis states that an early expansion of Sino-Tibetan languages occurred before 9,000 years bp from a region in southwest Sichuan province in China5 or in northeast India6, where a high diversity of Tibeto-Burman languages exists today. Consistent with the northern-origin hypothesis, our Bayesian phylogenetic analysis of 109 languages with 949 lexical root-meanings produced an estimated time depth for the divergence of Sino-Tibetan languages of approximately 4,200–7,800 years bp, with an average value of approximately 5,900 years bp. In addition, the phylogeny supported a dichotomy between Sinitic and Tibeto-Burman languages. Our results are compatible with the archaeological records, and with the farming and language dispersal hypothesis7 of agricultural expansion in China. Our findings provide a linguistic foothold for further interdisciplinary studies of prehistoric human activity in East Asia.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: The maximum clade credibility tree and the geographical distribution of the Sino-Tibetan languages.
Fig. 2: The tempo of divergence of the Sino-Tibetan languages and the changes in the number of archaeological sites in China.

Data availability

The data supporting the findings of this study are available in the Supplementary Information. Any other relevant data are available from the corresponding author upon reasonable request.

Code availability

The codes supporting the findings of this study are available in the Supplementary Information.

References

  1. 1.

    Handel, Z. What is Sino-Tibetan? Snapshot of a field and a language family in flux. Lang. Linguist. Compass 2, 422–441 (2008).

    Article  Google Scholar 

  2. 2.

    LaPolla, R. J. in Areal Diffusion and Genetic Inheritance: Problems in Comparative Linguistics (eds Dixon, R. M. W. & Aikhenvald, A. Y.) 225–254 (Oxford Univ. Press, Oxford, 2001).

  3. 3.

    Matisoff, J. A. Sino-Tibetan linguistics: present state and future prospects. Annu. Rev. Anthropol. 20, 469–504 (1991).

    Article  Google Scholar 

  4. 4.

    LaPolla, R. J. & Thurgood, G. Sino-Tibetan Languages (Routledge, London, 2016).

    Google Scholar 

  5. 5.

    van Driem, G. in The Peopling of East Asia: Putting Together Archaeology, Linguistics and Genetics (eds Sagart, L. et al.) 81–106 (Routledge, London, 2005).

  6. 6.

    Blench, R. & Post, M. in Trans-Himalayan Linguistics: Historical and Descriptive Linguistics of the Himalayan Area (eds Hill, N. & Owen-Smith, T.) 71–104 (De Gruyter Mouton, Berlin, 2013).

  7. 7.

    Bellwood, P. in The Peopling of East Asia (eds Sagart, L. et al.) 41–54 (Routledge, London, 2005).

  8. 8.

    Cavalli-Sforza, L. L., Piazza, A., Menozzi, P. & Mountain, J. Reconstruction of human evolution: bringing together genetic, archaeological, and linguistic data. Proc. Natl Acad. Sci. USA 85, 6002–6006 (1988).

    ADS  CAS  Article  Google Scholar 

  9. 9.

    Longobardi, G. et al. Across language families: genome diversity mirrors linguistic variation within Europe. Am. J. Phys. Anthropol. 157, 630–640 (2015).

    Article  Google Scholar 

  10. 10.

    Simons, G. F. & Fennig, C. D. Ethnologue: Languages of the World 21st edn (SIL International, Dallas, 2018).

    Google Scholar 

  11. 11.

    Peiros, I. & Starostin, S. A Comparative Vocabulary of Five Sino-Tibetan Languages (Univ. of Melbourne, Department of Linguistics and Applied Linguistics, Melbourne, 1996).

    Google Scholar 

  12. 12.

    Shafer, R. Classification of the Sino-Tibetan languages. Word 11, 94–111 (1955).

    Article  Google Scholar 

  13. 13.

    Fei, X.-T. On the problem of distinguishing nationalities in China. Soc. Sci. China 1, 158–174 (1980).

    Google Scholar 

  14. 14.

    Janhunen, J. Manchuria: an Ethnic History (Finno-Ugrian Society, Helsinki, 1996).

    Google Scholar 

  15. 15.

    Campbell, L. Historical Linguistics (Edinburgh Univ. Press, Edinburgh, 2013).

    Google Scholar 

  16. 16.

    Lees, R. B. The basis of glottochronology. Language 29, 113–127 (1953).

    Article  Google Scholar 

  17. 17.

    Greenhill, S. J., Atkinson, Q. D., Meade, A. & Gray, R. D. The shape and tempo of language evolution. Proc. R. Soc. Lond. B 277, 2443–2450 (2010).

    CAS  Article  Google Scholar 

  18. 18.

    Aikhenvald, A. Y. & Dixon, R. M. W. in Areal Diffusion and Genetic Inheritance: Problems in Comparative Linguistics (eds Dixon, R. M. W. & Aikhenvald, A. Y.) 1–26 (Oxford Univ. Press, Oxford, 2001).

  19. 19.

    Gray, R. D., Drummond, A. J. & Greenhill, S. J. Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science 323, 479–483 (2009).

    ADS  CAS  Article  Google Scholar 

  20. 20.

    Bouckaert, R. et al. Mapping the origins and expansion of the Indo-European language family. Science 337, 957–960 (2012).

    ADS  CAS  Article  Google Scholar 

  21. 21.

    Kolipakam, V. et al. A Bayesian phylogenetic study of the Dravidian language family. R. Soc. Open Sci. 5, 171504 (2018).

    ADS  Article  Google Scholar 

  22. 22.

    Swadesh, M. Towards greater accuracy in lexicostatistic dating. Int. J. Am. Linguist. 21, 121–137 (1955).

    Article  Google Scholar 

  23. 23.

    Liu, L. & Chen, X. The Archaeology of China: From the Late Paleolithic to the Early Bronze Age (Cambridge Univ. Press, Cambridge, 2012).

    Google Scholar 

  24. 24.

    Su, B. et al. Y chromosome haplotypes reveal prehistorical migrations to the Himalayas. Hum. Genet. 107, 582–590 (2000).

    CAS  Article  Google Scholar 

  25. 25.

    Stevens, C. J. & Fuller, D. Q. The spread of agriculture in eastern Asia. Lang. Dyn. Chang. 7, 152–186 (2017).

    Article  Google Scholar 

  26. 26.

    Diamond, J. & Bellwood, P. Farmers and their languages: the first expansions. Science 300, 597–603 (2003).

    ADS  CAS  Article  Google Scholar 

  27. 27.

    Li, K. S. Yunnan kaoguxue lunji (in Chinese) (Yunnan People’s Publishing House, Kunming, 1998).

    Google Scholar 

  28. 28.

    Handel, Z. J. Old Chinese Medials and Their Sino-Tibetan Origins: A Comparative Study (Institute of Linguistics, Academia Sinica, 2009).

  29. 29.

    Sagart, L., Blench, R. & Sanchez-Mazas, A. in The Peopling of East Asia: Putting Together Archaeology, Linguistics and Genetics (eds Sagart, L. et al.) 1–14 (Routledge, London, 2005).

  30. 30.

    Hosner, D., Wagner, M., Tarasov, P. E., Chen, X. & Leipe, C. Spatiotemporal distribution patterns of archaeological sites in China during the Neolithic and Bronze Age: an overview. Holocene 26, 1576–1593 (2016).

    ADS  Article  Google Scholar 

  31. 31.

    Dryer, M. S. in The Sino-Tibetan Languages (eds Thurgood, G & LaPolla, R. J.) 43–55 (Routledge, London, 2003).

  32. 32.

    Hammarström, H., Forkel, R. & Haspelmath, M. Glottolog 3.1  http://glottolog.org (Max Planck Institute for the Science of Human History, Jena, 2017).

    Google Scholar 

  33. 33.

    Bouckaert, R. et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLOS Comput. Biol. 10, e1003537 (2014).

    Article  Google Scholar 

  34. 34.

    Dunn, M., Kruspe, N. & Burenhult, N. Time and place in the prehistory of the Aslian languages. Hum. Biol. 85, 383–400 (2013).

    Article  Google Scholar 

  35. 35.

    Rambaut, A. et al. Tracer v.1. 6 http://beast.bio.ed.ac.uk (2014).

  36. 36.

    Cleveland, W. S. LOWESS: A program for smoothing scatterplots by robust locally weighted regression. Am. Stat. 35, 54 (1981).

    Article  Google Scholar 

Download references

Acknowledgements

This study is supported by projects at the National Natural Science Foundation of China (31521003, 31501010 and 31401060), the Postdoctoral Science Foundation of China (2015M570316 and 2015T80394), the Special Program for Key Basic Research of the Ministry of Science and Technology of the People’s Republic of China (2015FY111700), the Science and Technology Commission of Shanghai Municipality (16JC1400500 and 2017SHZDZX01) and the National Social Science Fund of China (13&ZD132 and 18ZDA296).

Reviewer information

Nature thanks Joshua B. Plotkin and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Affiliations

Authors

Contributions

M.Z., S.Y., W.P. and L.J. designed the research; M.Z., S.Y. and W.P. collated the linguistic and geographical data; M.Z. and S.Y. performed the research; M.Z., S.Y., W.P. and L.J. analysed the results; and M.Z., S.Y. and L.J. wrote the paper.

Corresponding author

Correspondence to Li Jin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 The geographical distribution of samples of 109 Sino-Tibetan languages in the East Asia.

The colours show affiliations to linguistic clades. The map is based on vector map data from https://www.naturalearthdata.com.

Extended Data Fig. 2 The distributions of the likelihood values.

The likelihood values for different combinations of mutation models, clock models and rate heterogeneity models. Each combination was run for 50 million generations, sampling every 5,000 generations. The first 10% of the iterations were treated as burn-in.

Extended Data Fig. 3 The maximum clade credibility tree of 109 Sino-Tibetan languages with node bars of ages and posterior probability values.

The iterations for tree reconstruction were set to 50 million generations, sampling trees every 5,000 generations and resulting in a sample of 10,000 trees. The first 10% of the iterations were treated as burn-in. The maximum clade credibility tree was established from 9,000 trees.

Extended Data Fig. 4 The results of four-point analysis for the maximum clade credibility tree of 109 Sino-Tibetan languages.

The maximum clade credibility tree was constructed on the basis of the best-fitting model combination, which was run for 50 million generations, sampling every 5,000 generations and treating the first 10% of the iterations as burn-in. The 109 Sino-Tibetan languages were grouped into major linguistic clades, which are labelled with the same colours and names as those shown in Fig. 1. The black numbers show the posterior probability values supporting the descendent clade. The red numbers in the parentheses show the reliability values on the internal nodes, calculated from four-point analysis.

Extended Data Fig. 5 The distribution of the root time of 109 Sino-Tibetan languages.

The root time estimates for the 109 Sino-Tibetan languages with different combinations of mutation models, clock models and rate heterogeneity models. Each combination was run for 50 million generations, sampling every 5,000 generations. The first 10% of the iterations were treated as burn-in.

Extended Data Fig. 6 The distribution of the root time of 107 Tibeto-Burman languages.

The root time estimates for the 107 Tibeto-Burman languages (that is, excluding ‘Chinese Mandarin’ and ‘Chinese Old’ from the Sino-Tibetan sample set) with different combinations of mutation models, clock models and rate heterogeneity models. Each combination was run for 50 million generations, sampling every 5,000 generations. The first 10% of the iterations were treated as burn-in.

Extended Data Fig. 7 The geographical plot of Urheimat inference of 109 Sino-Tibetan languages.

The probability density estimates for the original homeland of the Sino-Tibetan languages via the phylogeographical approach, implemented in BayesTraits package. The iterations in BayesTraits were set to 1,000,000. The sample period was set to 1,000. The first 25% of the iterations were treated as burn-in. The map is based on vector map data from https://www.naturalearthdata.com.

Supplementary information

Supplementary Information

This file contains the Supplementary Methods, Supplementary Discussion, Supplementary references and Supplementary Tables 2–4.

Reporting Summary

Supplementary Table 1

This table contains the information and lexical root-meanings of the ST languages in this paper selected from STEDT database according to the Swadesh 100 word-list.

Supplementary Data 1

This zipped file contains three files for the phylogenetic study of Sino-Tibetan languages: a nexus file, an XML file and a tree file of the MCC tree.

Supplementary Data 2

This zipped file contains raw data for estimating the Sino-Tibetan evolutionary tempo, and Matlab codes for generating the results shown in Figure 2.

Supplementary Data 3

This zipped file contains raw data and Matlab codes for applying four-point analysis on the Sino-Tibetan languages.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, M., Yan, S., Pan, W. et al. Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic. Nature 569, 112–115 (2019). https://doi.org/10.1038/s41586-019-1153-z

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing