Letter | Published:

Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic

Naturevolume 569pages112115 (2019) | Download Citation

Abstract

The study of language origin and divergence is important for understanding the history of human populations and their cultures. The Sino-Tibetan language family is the second largest in the world after Indo-European, and there is a long-running debate about its phylogeny and the time depth of its original divergence1. Here we perform a Bayesian phylogenetic analysis to examine two competing hypotheses of the origin of the Sino-Tibetan language family: the ‘northern-origin hypothesis’ and the ‘southwestern-origin hypothesis’. The northern-origin hypothesis states that the initial expansion of Sino-Tibetan languages occurred approximately 4,000–6,000 years before present (bp; taken as ad 1950) in the Yellow River basin of northern China2,3,4, and that this expansion is associated with the development of the Yangshao and/or Majiayao Neolithic cultures. The southwestern-origin hypothesis states that an early expansion of Sino-Tibetan languages occurred before 9,000 years bp from a region in southwest Sichuan province in China5 or in northeast India6, where a high diversity of Tibeto-Burman languages exists today. Consistent with the northern-origin hypothesis, our Bayesian phylogenetic analysis of 109 languages with 949 lexical root-meanings produced an estimated time depth for the divergence of Sino-Tibetan languages of approximately 4,200–7,800 years bp, with an average value of approximately 5,900 years bp. In addition, the phylogeny supported a dichotomy between Sinitic and Tibeto-Burman languages. Our results are compatible with the archaeological records, and with the farming and language dispersal hypothesis7 of agricultural expansion in China. Our findings provide a linguistic foothold for further interdisciplinary studies of prehistoric human activity in East Asia.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

The data supporting the findings of this study are available in the Supplementary Information. Any other relevant data are available from the corresponding author upon reasonable request.

Code availability

The codes supporting the findings of this study are available in the Supplementary Information.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Handel, Z. What is Sino-Tibetan? Snapshot of a field and a language family in flux. Lang. Linguist. Compass 2, 422–441 (2008).

  2. 2.

    LaPolla, R. J. in Areal Diffusion and Genetic Inheritance: Problems in Comparative Linguistics (eds Dixon, R. M. W. & Aikhenvald, A. Y.) 225–254 (Oxford Univ. Press, Oxford, 2001).

  3. 3.

    Matisoff, J. A. Sino-Tibetan linguistics: present state and future prospects. Annu. Rev. Anthropol. 20, 469–504 (1991).

  4. 4.

    LaPolla, R. J. & Thurgood, G. Sino-Tibetan Languages (Routledge, London, 2016).

  5. 5.

    van Driem, G. in The Peopling of East Asia: Putting Together Archaeology, Linguistics and Genetics (eds Sagart, L. et al.) 81–106 (Routledge, London, 2005).

  6. 6.

    Blench, R. & Post, M. in Trans-Himalayan Linguistics: Historical and Descriptive Linguistics of the Himalayan Area (eds Hill, N. & Owen-Smith, T.) 71–104 (De Gruyter Mouton, Berlin, 2013).

  7. 7.

    Bellwood, P. in The Peopling of East Asia (eds Sagart, L. et al.) 41–54 (Routledge, London, 2005).

  8. 8.

    Cavalli-Sforza, L. L., Piazza, A., Menozzi, P. & Mountain, J. Reconstruction of human evolution: bringing together genetic, archaeological, and linguistic data. Proc. Natl Acad. Sci. USA 85, 6002–6006 (1988).

  9. 9.

    Longobardi, G. et al. Across language families: genome diversity mirrors linguistic variation within Europe. Am. J. Phys. Anthropol. 157, 630–640 (2015).

  10. 10.

    Simons, G. F. & Fennig, C. D. Ethnologue: Languages of the World 21st edn (SIL International, Dallas, 2018).

  11. 11.

    Peiros, I. & Starostin, S. A Comparative Vocabulary of Five Sino-Tibetan Languages (Univ. of Melbourne, Department of Linguistics and Applied Linguistics, Melbourne, 1996).

  12. 12.

    Shafer, R. Classification of the Sino-Tibetan languages. Word 11, 94–111 (1955).

  13. 13.

    Fei, X.-T. On the problem of distinguishing nationalities in China. Soc. Sci. China 1, 158–174 (1980).

  14. 14.

    Janhunen, J. Manchuria: an Ethnic History (Finno-Ugrian Society, Helsinki, 1996).

  15. 15.

    Campbell, L. Historical Linguistics (Edinburgh Univ. Press, Edinburgh, 2013).

  16. 16.

    Lees, R. B. The basis of glottochronology. Language 29, 113–127 (1953).

  17. 17.

    Greenhill, S. J., Atkinson, Q. D., Meade, A. & Gray, R. D. The shape and tempo of language evolution. Proc. R. Soc. Lond. B 277, 2443–2450 (2010).

  18. 18.

    Aikhenvald, A. Y. & Dixon, R. M. W. in Areal Diffusion and Genetic Inheritance: Problems in Comparative Linguistics (eds Dixon, R. M. W. & Aikhenvald, A. Y.) 1–26 (Oxford Univ. Press, Oxford, 2001).

  19. 19.

    Gray, R. D., Drummond, A. J. & Greenhill, S. J. Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science 323, 479–483 (2009).

  20. 20.

    Bouckaert, R. et al. Mapping the origins and expansion of the Indo-European language family. Science 337, 957–960 (2012).

  21. 21.

    Kolipakam, V. et al. A Bayesian phylogenetic study of the Dravidian language family. R. Soc. Open Sci. 5, 171504 (2018).

  22. 22.

    Swadesh, M. Towards greater accuracy in lexicostatistic dating. Int. J. Am. Linguist. 21, 121–137 (1955).

  23. 23.

    Liu, L. & Chen, X. The Archaeology of China: From the Late Paleolithic to the Early Bronze Age (Cambridge Univ. Press, Cambridge, 2012).

  24. 24.

    Su, B. et al. Y chromosome haplotypes reveal prehistorical migrations to the Himalayas. Hum. Genet. 107, 582–590 (2000).

  25. 25.

    Stevens, C. J. & Fuller, D. Q. The spread of agriculture in eastern Asia. Lang. Dyn. Chang. 7, 152–186 (2017).

  26. 26.

    Diamond, J. & Bellwood, P. Farmers and their languages: the first expansions. Science 300, 597–603 (2003).

  27. 27.

    Li, K. S. Yunnan kaoguxue lunji (in Chinese) (Yunnan People’s Publishing House, Kunming, 1998).

  28. 28.

    Handel, Z. J. Old Chinese Medials and Their Sino-Tibetan Origins: A Comparative Study (Institute of Linguistics, Academia Sinica, 2009).

  29. 29.

    Sagart, L., Blench, R. & Sanchez-Mazas, A. in The Peopling of East Asia: Putting Together Archaeology, Linguistics and Genetics (eds Sagart, L. et al.) 1–14 (Routledge, London, 2005).

  30. 30.

    Hosner, D., Wagner, M., Tarasov, P. E., Chen, X. & Leipe, C. Spatiotemporal distribution patterns of archaeological sites in China during the Neolithic and Bronze Age: an overview. Holocene 26, 1576–1593 (2016).

  31. 31.

    Dryer, M. S. in The Sino-Tibetan Languages (eds Thurgood, G & LaPolla, R. J.) 43–55 (Routledge, London, 2003).

  32. 32.

    Hammarström, H., Forkel, R. & Haspelmath, M. Glottolog 3.1  http://glottolog.org (Max Planck Institute for the Science of Human History, Jena, 2017).

  33. 33.

    Bouckaert, R. et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLOS Comput. Biol. 10, e1003537 (2014).

  34. 34.

    Dunn, M., Kruspe, N. & Burenhult, N. Time and place in the prehistory of the Aslian languages. Hum. Biol. 85, 383–400 (2013).

  35. 35.

    Rambaut, A. et al. Tracer v.1. 6 http://beast.bio.ed.ac.uk (2014).

  36. 36.

    Cleveland, W. S. LOWESS: A program for smoothing scatterplots by robust locally weighted regression. Am. Stat. 35, 54 (1981).

Download references

Acknowledgements

This study is supported by projects at the National Natural Science Foundation of China (31521003, 31501010 and 31401060), the Postdoctoral Science Foundation of China (2015M570316 and 2015T80394), the Special Program for Key Basic Research of the Ministry of Science and Technology of the People’s Republic of China (2015FY111700), the Science and Technology Commission of Shanghai Municipality (16JC1400500 and 2017SHZDZX01) and the National Social Science Fund of China (13&ZD132 and 18ZDA296).

Reviewer information

Nature thanks Joshua B. Plotkin and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Author notes

  1. These authors contributed equally: Menghan Zhang, Shi Yan

Affiliations

  1. State Key Laboratory of Genetic Engineering, and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China

    • Menghan Zhang
    •  & Li Jin
  2. Institute of Modern Languages and Linguistics, Fudan University, Shanghai, China

    • Menghan Zhang
  3. Human Phenome Institute, Fudan University, Shanghai, China

    • Shi Yan
    •  & Li Jin
  4. Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, China

    • Shi Yan
  5. Institute for Humanities and Social Science Data, School of Data Science, Fudan University, Shanghai, China

    • Wuyun Pan
  6. Institute of Linguistics, College of Humanities and Communications, Shanghai Normal University, Shanghai, China

    • Wuyun Pan
  7. Chinese Academy of Sciences Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, China

    • Li Jin

Authors

  1. Search for Menghan Zhang in:

  2. Search for Shi Yan in:

  3. Search for Wuyun Pan in:

  4. Search for Li Jin in:

Contributions

M.Z., S.Y., W.P. and L.J. designed the research; M.Z., S.Y. and W.P. collated the linguistic and geographical data; M.Z. and S.Y. performed the research; M.Z., S.Y., W.P. and L.J. analysed the results; and M.Z., S.Y. and L.J. wrote the paper.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Li Jin.

Extended data figures and tables

  1. Extended Data Fig. 1 The geographical distribution of samples of 109 Sino-Tibetan languages in the East Asia.

    The colours show affiliations to linguistic clades. The map is based on vector map data from https://www.naturalearthdata.com.

  2. Extended Data Fig. 2 The distributions of the likelihood values.

    The likelihood values for different combinations of mutation models, clock models and rate heterogeneity models. Each combination was run for 50 million generations, sampling every 5,000 generations. The first 10% of the iterations were treated as burn-in.

  3. Extended Data Fig. 3 The maximum clade credibility tree of 109 Sino-Tibetan languages with node bars of ages and posterior probability values.

    The iterations for tree reconstruction were set to 50 million generations, sampling trees every 5,000 generations and resulting in a sample of 10,000 trees. The first 10% of the iterations were treated as burn-in. The maximum clade credibility tree was established from 9,000 trees.

  4. Extended Data Fig. 4 The results of four-point analysis for the maximum clade credibility tree of 109 Sino-Tibetan languages.

    The maximum clade credibility tree was constructed on the basis of the best-fitting model combination, which was run for 50 million generations, sampling every 5,000 generations and treating the first 10% of the iterations as burn-in. The 109 Sino-Tibetan languages were grouped into major linguistic clades, which are labelled with the same colours and names as those shown in Fig. 1. The black numbers show the posterior probability values supporting the descendent clade. The red numbers in the parentheses show the reliability values on the internal nodes, calculated from four-point analysis.

  5. Extended Data Fig. 5 The distribution of the root time of 109 Sino-Tibetan languages.

    The root time estimates for the 109 Sino-Tibetan languages with different combinations of mutation models, clock models and rate heterogeneity models. Each combination was run for 50 million generations, sampling every 5,000 generations. The first 10% of the iterations were treated as burn-in.

  6. Extended Data Fig. 6 The distribution of the root time of 107 Tibeto-Burman languages.

    The root time estimates for the 107 Tibeto-Burman languages (that is, excluding ‘Chinese Mandarin’ and ‘Chinese Old’ from the Sino-Tibetan sample set) with different combinations of mutation models, clock models and rate heterogeneity models. Each combination was run for 50 million generations, sampling every 5,000 generations. The first 10% of the iterations were treated as burn-in.

  7. Extended Data Fig. 7 The geographical plot of Urheimat inference of 109 Sino-Tibetan languages.

    The probability density estimates for the original homeland of the Sino-Tibetan languages via the phylogeographical approach, implemented in BayesTraits package. The iterations in BayesTraits were set to 1,000,000. The sample period was set to 1,000. The first 25% of the iterations were treated as burn-in. The map is based on vector map data from https://www.naturalearthdata.com.

Supplementary information

  1. Supplementary Information

    This file contains the Supplementary Methods, Supplementary Discussion, Supplementary references and Supplementary Tables 2–4.

  2. Reporting Summary

  3. Supplementary Table 1

    This table contains the information and lexical root-meanings of the ST languages in this paper selected from STEDT database according to the Swadesh 100 word-list.

  4. Supplementary Data 1

    This zipped file contains three files for the phylogenetic study of Sino-Tibetan languages: a nexus file, an XML file and a tree file of the MCC tree.

  5. Supplementary Data 2

    This zipped file contains raw data for estimating the Sino-Tibetan evolutionary tempo, and Matlab codes for generating the results shown in Figure 2.

  6. Supplementary Data 3

    This zipped file contains raw data and Matlab codes for applying four-point analysis on the Sino-Tibetan languages.

About this article

Publication history

Received

Accepted

Published

Issue Date

DOI

https://doi.org/10.1038/s41586-019-1153-z

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.