Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic

Zhang, Menghan; Yan, Shi; Pan, Wuyun; Jin, Li

doi:10.1038/s41586-019-1153-z

Letter
Published: 24 April 2019

Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic

Menghan Zhang^1,2^na1,
Shi Yan^3,4^na1,
Wuyun Pan^5,6 &
…
Li Jin^1,3,7

Nature volume 569, pages 112–115 (2019)Cite this article

22k Accesses
102 Citations
114 Altmetric
Metrics details

Subjects

Abstract

The study of language origin and divergence is important for understanding the history of human populations and their cultures. The Sino-Tibetan language family is the second largest in the world after Indo-European, and there is a long-running debate about its phylogeny and the time depth of its original divergence¹. Here we perform a Bayesian phylogenetic analysis to examine two competing hypotheses of the origin of the Sino-Tibetan language family: the ‘northern-origin hypothesis’ and the ‘southwestern-origin hypothesis’. The northern-origin hypothesis states that the initial expansion of Sino-Tibetan languages occurred approximately 4,000–6,000 years before present (bp; taken as ad 1950) in the Yellow River basin of northern China^2,3,4, and that this expansion is associated with the development of the Yangshao and/or Majiayao Neolithic cultures. The southwestern-origin hypothesis states that an early expansion of Sino-Tibetan languages occurred before 9,000 years bp from a region in southwest Sichuan province in China⁵ or in northeast India⁶, where a high diversity of Tibeto-Burman languages exists today. Consistent with the northern-origin hypothesis, our Bayesian phylogenetic analysis of 109 languages with 949 lexical root-meanings produced an estimated time depth for the divergence of Sino-Tibetan languages of approximately 4,200–7,800 years bp, with an average value of approximately 5,900 years bp. In addition, the phylogeny supported a dichotomy between Sinitic and Tibeto-Burman languages. Our results are compatible with the archaeological records, and with the farming and language dispersal hypothesis⁷ of agricultural expansion in China. Our findings provide a linguistic foothold for further interdisciplinary studies of prehistoric human activity in East Asia.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The maximum clade credibility tree and the geographical distribution of the Sino-Tibetan languages.**

**Fig. 2: The tempo of divergence of the Sino-Tibetan languages and the changes in the number of archaeological sites in China.**

Dated phylogeny suggests early Neolithic origin of Sino-Tibetan languages

Article Open access 27 November 2020

Hanzhi Zhang, Ting Ji, … Ruth Mace

Phylogenetic evidence reveals early Kra-Dai divergence and dispersal in the late Holocene

Article Open access 30 October 2023

Yuxin Tao, Yuancheng Wei, … Menghan Zhang

A comparative wordlist for investigating distant relations among languages in Lowland South America

Article Open access 18 January 2024

Frederic Blum, Carlos Barrientos, … Johann-Mattis List

Data availability

The data supporting the findings of this study are available in the Supplementary Information. Any other relevant data are available from the corresponding author upon reasonable request.

Code availability

The codes supporting the findings of this study are available in the Supplementary Information.

References

Handel, Z. What is Sino-Tibetan? Snapshot of a field and a language family in flux. Lang. Linguist. Compass 2, 422–441 (2008).
Article Google Scholar
LaPolla, R. J. in Areal Diffusion and Genetic Inheritance: Problems in Comparative Linguistics (eds Dixon, R. M. W. & Aikhenvald, A. Y.) 225–254 (Oxford Univ. Press, Oxford, 2001).
Matisoff, J. A. Sino-Tibetan linguistics: present state and future prospects. Annu. Rev. Anthropol. 20, 469–504 (1991).
Article Google Scholar
LaPolla, R. J. & Thurgood, G. Sino-Tibetan Languages (Routledge, London, 2016).
Book Google Scholar
van Driem, G. in The Peopling of East Asia: Putting Together Archaeology, Linguistics and Genetics (eds Sagart, L. et al.) 81–106 (Routledge, London, 2005).
Blench, R. & Post, M. in Trans-Himalayan Linguistics: Historical and Descriptive Linguistics of the Himalayan Area (eds Hill, N. & Owen-Smith, T.) 71–104 (De Gruyter Mouton, Berlin, 2013).
Bellwood, P. in The Peopling of East Asia (eds Sagart, L. et al.) 41–54 (Routledge, London, 2005).
Cavalli-Sforza, L. L., Piazza, A., Menozzi, P. & Mountain, J. Reconstruction of human evolution: bringing together genetic, archaeological, and linguistic data. Proc. Natl Acad. Sci. USA 85, 6002–6006 (1988).
Article ADS CAS Google Scholar
Longobardi, G. et al. Across language families: genome diversity mirrors linguistic variation within Europe. Am. J. Phys. Anthropol. 157, 630–640 (2015).
Article Google Scholar
Simons, G. F. & Fennig, C. D. Ethnologue: Languages of the World 21st edn (SIL International, Dallas, 2018).
Google Scholar
Peiros, I. & Starostin, S. A Comparative Vocabulary of Five Sino-Tibetan Languages (Univ. of Melbourne, Department of Linguistics and Applied Linguistics, Melbourne, 1996).
Google Scholar
Shafer, R. Classification of the Sino-Tibetan languages. Word 11, 94–111 (1955).
Article Google Scholar
Fei, X.-T. On the problem of distinguishing nationalities in China. Soc. Sci. China 1, 158–174 (1980).
Google Scholar
Janhunen, J. Manchuria: an Ethnic History (Finno-Ugrian Society, Helsinki, 1996).
Google Scholar
Campbell, L. Historical Linguistics (Edinburgh Univ. Press, Edinburgh, 2013).
Google Scholar
Lees, R. B. The basis of glottochronology. Language 29, 113–127 (1953).
Article Google Scholar
Greenhill, S. J., Atkinson, Q. D., Meade, A. & Gray, R. D. The shape and tempo of language evolution. Proc. R. Soc. Lond. B 277, 2443–2450 (2010).
Article CAS Google Scholar
Aikhenvald, A. Y. & Dixon, R. M. W. in Areal Diffusion and Genetic Inheritance: Problems in Comparative Linguistics (eds Dixon, R. M. W. & Aikhenvald, A. Y.) 1–26 (Oxford Univ. Press, Oxford, 2001).
Gray, R. D., Drummond, A. J. & Greenhill, S. J. Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science 323, 479–483 (2009).
Article ADS CAS Google Scholar
Bouckaert, R. et al. Mapping the origins and expansion of the Indo-European language family. Science 337, 957–960 (2012).
Article ADS CAS Google Scholar
Kolipakam, V. et al. A Bayesian phylogenetic study of the Dravidian language family. R. Soc. Open Sci. 5, 171504 (2018).
Article ADS Google Scholar
Swadesh, M. Towards greater accuracy in lexicostatistic dating. Int. J. Am. Linguist. 21, 121–137 (1955).
Article Google Scholar
Liu, L. & Chen, X. The Archaeology of China: From the Late Paleolithic to the Early Bronze Age (Cambridge Univ. Press, Cambridge, 2012).
Book Google Scholar
Su, B. et al. Y chromosome haplotypes reveal prehistorical migrations to the Himalayas. Hum. Genet. 107, 582–590 (2000).
Article CAS Google Scholar
Stevens, C. J. & Fuller, D. Q. The spread of agriculture in eastern Asia. Lang. Dyn. Chang. 7, 152–186 (2017).
Article Google Scholar
Diamond, J. & Bellwood, P. Farmers and their languages: the first expansions. Science 300, 597–603 (2003).
Article ADS CAS Google Scholar
Li, K. S. Yunnan kaoguxue lunji (in Chinese) (Yunnan People’s Publishing House, Kunming, 1998).
Google Scholar
Handel, Z. J. Old Chinese Medials and Their Sino-Tibetan Origins: A Comparative Study (Institute of Linguistics, Academia Sinica, 2009).
Sagart, L., Blench, R. & Sanchez-Mazas, A. in The Peopling of East Asia: Putting Together Archaeology, Linguistics and Genetics (eds Sagart, L. et al.) 1–14 (Routledge, London, 2005).
Hosner, D., Wagner, M., Tarasov, P. E., Chen, X. & Leipe, C. Spatiotemporal distribution patterns of archaeological sites in China during the Neolithic and Bronze Age: an overview. Holocene 26, 1576–1593 (2016).
Article ADS Google Scholar
Dryer, M. S. in The Sino-Tibetan Languages (eds Thurgood, G & LaPolla, R. J.) 43–55 (Routledge, London, 2003).
Hammarström, H., Forkel, R. & Haspelmath, M. Glottolog 3.1 http://glottolog.org (Max Planck Institute for the Science of Human History, Jena, 2017).
Google Scholar
Bouckaert, R. et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLOS Comput. Biol. 10, e1003537 (2014).
Article Google Scholar
Dunn, M., Kruspe, N. & Burenhult, N. Time and place in the prehistory of the Aslian languages. Hum. Biol. 85, 383–400 (2013).
Article Google Scholar
Rambaut, A. et al. Tracer v.1. 6 http://beast.bio.ed.ac.uk (2014).
Cleveland, W. S. LOWESS: A program for smoothing scatterplots by robust locally weighted regression. Am. Stat. 35, 54 (1981).
Article Google Scholar

Download references

Acknowledgements

This study is supported by projects at the National Natural Science Foundation of China (31521003, 31501010 and 31401060), the Postdoctoral Science Foundation of China (2015M570316 and 2015T80394), the Special Program for Key Basic Research of the Ministry of Science and Technology of the People’s Republic of China (2015FY111700), the Science and Technology Commission of Shanghai Municipality (16JC1400500 and 2017SHZDZX01) and the National Social Science Fund of China (13&ZD132 and 18ZDA296).

Reviewer information

Nature thanks Joshua B. Plotkin and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

These authors contributed equally: Menghan Zhang, Shi Yan

Authors and Affiliations

State Key Laboratory of Genetic Engineering, and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
Menghan Zhang & Li Jin
Institute of Modern Languages and Linguistics, Fudan University, Shanghai, China
Menghan Zhang
Human Phenome Institute, Fudan University, Shanghai, China
Shi Yan & Li Jin
Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, China
Shi Yan
Institute for Humanities and Social Science Data, School of Data Science, Fudan University, Shanghai, China
Wuyun Pan
Institute of Linguistics, College of Humanities and Communications, Shanghai Normal University, Shanghai, China
Wuyun Pan
Chinese Academy of Sciences Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, China
Li Jin

Authors

Menghan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shi Yan
View author publications
You can also search for this author in PubMed Google Scholar
Wuyun Pan
View author publications
You can also search for this author in PubMed Google Scholar
Li Jin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.Z., S.Y., W.P. and L.J. designed the research; M.Z., S.Y. and W.P. collated the linguistic and geographical data; M.Z. and S.Y. performed the research; M.Z., S.Y., W.P. and L.J. analysed the results; and M.Z., S.Y. and L.J. wrote the paper.

Corresponding author

Correspondence to Li Jin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 The geographical distribution of samples of 109 Sino-Tibetan languages in the East Asia.

The colours show affiliations to linguistic clades. The map is based on vector map data from https://www.naturalearthdata.com.

Extended Data Fig. 2 The distributions of the likelihood values.

The likelihood values for different combinations of mutation models, clock models and rate heterogeneity models. Each combination was run for 50 million generations, sampling every 5,000 generations. The first 10% of the iterations were treated as burn-in.

Extended Data Fig. 3 The maximum clade credibility tree of 109 Sino-Tibetan languages with node bars of ages and posterior probability values.

The iterations for tree reconstruction were set to 50 million generations, sampling trees every 5,000 generations and resulting in a sample of 10,000 trees. The first 10% of the iterations were treated as burn-in. The maximum clade credibility tree was established from 9,000 trees.

Extended Data Fig. 4 The results of four-point analysis for the maximum clade credibility tree of 109 Sino-Tibetan languages.

The maximum clade credibility tree was constructed on the basis of the best-fitting model combination, which was run for 50 million generations, sampling every 5,000 generations and treating the first 10% of the iterations as burn-in. The 109 Sino-Tibetan languages were grouped into major linguistic clades, which are labelled with the same colours and names as those shown in Fig. 1. The black numbers show the posterior probability values supporting the descendent clade. The red numbers in the parentheses show the reliability values on the internal nodes, calculated from four-point analysis.

Extended Data Fig. 5 The distribution of the root time of 109 Sino-Tibetan languages.

The root time estimates for the 109 Sino-Tibetan languages with different combinations of mutation models, clock models and rate heterogeneity models. Each combination was run for 50 million generations, sampling every 5,000 generations. The first 10% of the iterations were treated as burn-in.

Extended Data Fig. 6 The distribution of the root time of 107 Tibeto-Burman languages.

The root time estimates for the 107 Tibeto-Burman languages (that is, excluding ‘Chinese Mandarin’ and ‘Chinese Old’ from the Sino-Tibetan sample set) with different combinations of mutation models, clock models and rate heterogeneity models. Each combination was run for 50 million generations, sampling every 5,000 generations. The first 10% of the iterations were treated as burn-in.

Extended Data Fig. 7 The geographical plot of Urheimat inference of 109 Sino-Tibetan languages.

The probability density estimates for the original homeland of the Sino-Tibetan languages via the phylogeographical approach, implemented in BayesTraits package. The iterations in BayesTraits were set to 1,000,000. The sample period was set to 1,000. The first 25% of the iterations were treated as burn-in. The map is based on vector map data from https://www.naturalearthdata.com.

Supplementary information

Supplementary Information

This file contains the Supplementary Methods, Supplementary Discussion, Supplementary references and Supplementary Tables 2–4.

Reporting Summary

Supplementary Table 1

This table contains the information and lexical root-meanings of the ST languages in this paper selected from STEDT database according to the Swadesh 100 word-list.

Supplementary Data 1

This zipped file contains three files for the phylogenetic study of Sino-Tibetan languages: a nexus file, an XML file and a tree file of the MCC tree.

Supplementary Data 2

This zipped file contains raw data for estimating the Sino-Tibetan evolutionary tempo, and Matlab codes for generating the results shown in Figure 2.

Supplementary Data 3

This zipped file contains raw data and Matlab codes for applying four-point analysis on the Sino-Tibetan languages.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, M., Yan, S., Pan, W. et al. Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic. Nature 569, 112–115 (2019). https://doi.org/10.1038/s41586-019-1153-z

Download citation

Received: 13 January 2019
Accepted: 28 March 2019
Published: 24 April 2019
Issue Date: 02 May 2019
DOI: https://doi.org/10.1038/s41586-019-1153-z

This article is cited by

Inferring language dispersal patterns with velocity field estimation
- Sizhe Yang
- Xiaoru Sun
- Menghan Zhang
Nature Communications (2024)
Geospatial modelling of farmer–herder interactions maps cultural geography of Bronze and Iron Age Tibet, 3600–2200 BP
- Xinzhou Chen
- Hongliang Lü
- Michael D. Frachetti
Scientific Reports (2024)
Genomic formation of Tibeto-Burman speaking populations in Guizhou, Southwest China
- Jinwen Chen
- Han Zhang
- Jiang Huang
BMC Genomics (2023)
Phylogenetic evidence reveals early Kra-Dai divergence and dispersal in the late Holocene
- Yuxin Tao
- Yuancheng Wei
- Menghan Zhang
Nature Communications (2023)
Reliability models in cultural phylogenetics
- Rafael Ventura
Biology & Philosophy (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.